Simultarieotisly  in  Several  Items 


EoUiii  §h.ealy  :and  William  Bio.ut1 


•>  *.  t  v  *,  *’*,.V  ,  ’ 


.  vVi  rV.,1  :*4 


.  -  IDejpaEtmftiit  o£  S^jbis.ti^s{ 


Aprils  25,  1991 


SimWOH  STA'feME^  A  j 

vgproyed  W  public  release} 
''.ntfUnbnllon  Unlimited  ,  j 


Best  Available  Copy 


Prepared,  for  the  Cognitive  Science  Research.  Program,  Cognitive  and  Neural  Sciences 
Division,  Office  of  Naval  Research,  under  grant  number  N00014-90-J-194.0,  4421-548.  Ap¬ 
proved  for  public  release,  distribution  unlimited.  Reproduction  in  whole  of  in  part  is 
premitted  for  any  purpose  of  the  United  States  Government. 


1  The  research  reported  here  is  collaborative  in  every  respect  and.  the  cycler  of  authorship 
is  alphabetical,  -  ~ 


U 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


REPORT  DOCUMENTATION  PAGE 

form  Approved 

OMB  Wo  0704-0188 

la.  REPORT  SECURITY  CLASSIFICATION 

Unclassified 

lb.  RESTRICTIVE  MARKINGS 

2a.  SECURITY  CLASSIFICATION  AUTHORITY 


2b.  DECLASSIFICATION/ DOWNGRADING  SCHEDULE 
4.  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

1991  -  #3 


3  DISTRIBUTION /AVAILABILITY  OF  REPORT 

Approved  for  public  release; 
distribution  unlimited 

5.  MONITORING  ORGANIZATION  REPORT  NUMBER(S) 


6*.  NAME  OF  PERFORMING  ORGANIZATION  6b.  OFFICE  SYMBOL  7a.  NAME  OF  MONITORING  ORGANIZATION 

University  of  Illinois  (if  applicable)  Cognitive  Science  Program 

Department  of  Statistics _ Office  of  Naval  Research  (C 


6c  ADDRESS  (City,  State,  and  ZIP  Code) 

101  Illini  Hall 
725  S.  Wright  St. 
Champaign,  IL  61820 

8a.  NAME  OF  FUNDING /SPONSORING 
ORGANIZATION 


Cognitive  Science  Program 

_ Office. .of  .Haval,  Research .  (Crate  1142, 

7b.  ADDRESS  {City,  State,  and  ZIP  Code) 

800  N.  Quincy 
Arlington,  VA  22217-5000 

8b.  OFFICE  SYMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

(If  applicable)  N00014-90-J-1940 


PROJECT 

TASK 

NO. 

NO 

RR04204 

RR04204-01 

WORK  UNIT 
ACCESSION  NO. 


8c.  ADDRESS  (City,  State,  and  ZIP  Code)  10  SOURCE  OF  FUNDING  NUMBERS 

101  Illini  Hall  program  project  Itask  I  work  ui 

725  S.  Wright  St.  ELEMENT  N0'  Na  N0  ACCESS'° 

Champaign.  TT,  61820 _ (61153N  RR04204  RR04204-0l|  4421- 

11.  TITLE  (Include  Security  Classification) 

A  Procedure  to  Detect  Item  Bias  Present  Simultaneously  in  Several  Items 

12.  PERSONAL  AUTHOR(S) 

Robin  Shealy  and  William  Stout 

13a.  TYPE  OF  REPORT  Il3b  TIME  COVERED  1 14.  DATE  OF  REPORT  (Year.  Month,  Day)  |l$  PAGE  COUNT 

technical _  1.988- .  TQ....ia91  April  25.  1991 _  35 _ 

16.  SUPPLEMENTARY  NOTATION 

Software  to  carry  out  the  procedure  is  available  from  the  authors 

17.  COSATI  CODES  18.  SUBJECT  TERMS  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 

FIELD  GROUP  SUB-GROUP  „  - - - 

- bee  reverse  A-jua.-moa  yor  < 

1  I _ »T-3  Eslil  af 

19  ABSTRACT  (Continue  on  reverse  if  necessary  and  identify  by  block  number)  01’IC  T/  B  q 

See  reverse  i  L) 

I  \  ,  JuatiA  icaLioJL. _ _ 


17. 

COSATI  CODES 

FIELD 

GROUP  SUB-GROUP 

19  ABSTRACT  (Continue  on  reverse  if  necessary  and  identify  by  block  number) 

See  reverse 


own 

.  INJPF.-CW*  , 


!  By - 

! .  .Disj  >  r  t  o  .if _ 

;  __Ai< 5 lability  Codas 
j/.  'iiil  ar,.-./ or 
I  Special 


!(V\  ! 


20.  DISTRIBUTION /AVAILABILITY  OF  ABSTRACT 
13 UNCLASSIFIED/UNLIMITED  □  SAME  AS  RPT  □  DTIC  USERS 
22a.  NAME  OF  RESPONSIBLE  INDIVIDUAL 


21.  ABSTRACT  SECURITY  CLASSIFICATION 

22b  TELEPHONE  (Include  Area  Code)  I  22c  OFFICE  SYMBOL 


DD  Form  1473,  JUN  86 


Previous  editions  are  obsolete. 

S/N  0102-LF-0 14-6603 


SECURITY  CLASSIFICATION  OF  THIS  PAGE 


91  4  30  109 


'4 

'4 


ABSTRACT 


This  paper  presents  a  statistical  procedure  (denoted  by  SIB)  designed  to  test  for  uni¬ 
directional  test  bias  existing  simultaneously  in  several  items  of  an  ability  test.  It  was 
argued  in  Shealy  and  Stout  (1991.)  that  in  order  to  model  such  bias  with  an  IRT  model,  a 
multidimensional  model  is  necessary.  The  proposed  procedure,  based  on  this  multidimen¬ 
sional  IRT  modeling  approach,  statistically  tests  for  bias  in  one  or  more  items  at  a  time 
and  is  corrected  for  the  inflation,  (or  deflation)  of  the  test  statistic  due  to  target  ability 
difference,  a  valid  group  difference  that  is  conceptually  independent  of  psychological  test 
bias^The  correction  plays  the  same  role  as  the  practice  of  including  the  single  studied 
item  in  the~“matching  criterion*  score  in  the  Mantel-Haenszel  (MH)  procedure  adapted 
for  test  responses  by  Holland  and  Thayer  (1988).  It  is  shown  through  the  initial  portion  of 
an  extensive  simulation  study  underway  (Shealy  (1991))  that,  with  the  correction  in  place, 
the  procedure  performs  as  well  as  the  MH  procedure  in  many  cases  when  there  is  a  single 
biased  item,  and  performs  well  in  the  case  of  multiple  item  test  bias. 


Key  Words:  item  bias,  test  bias,  DIF,  latent  trait  theory,  item  response  theory,  target  abil¬ 
ity,  valid  subtest,  nuisance  determinants,  potential  for  bias,  expressed  bias,  unidirectional 
test  bias,  bidirectional  test  bias,  SIB,  Mantel-Haenszel. 


1 


INTRODUCTION 


The  purpose  of  this  paper  is  to  present  a  statistical  procedure  (denoted  by  SIB  for 
simultaneous  item  bias)  for  detecting  bias  present  in  one  or  more  test  items  of  a  standard¬ 
ized  ability  test.  The  procedure  is  based  on  the  multidimensional  item  response  theory 
(IRT)  model  of  test  bias  presented  in  Shealy  and  Stout  (1991).  By  “test  bias”  we  mean 
a  formalization  of  the  intuitive  idea  that  a  test  is  less  valid  for  one  group  of  examinees 
than  for  another  group  in  its  attempt  to  assess  examinee  differences  in  a  prescribed  la¬ 
tent  trait,  such  as  mathematics  ability.  Test  bias  is  conceptualized  herein  as  the  result  of 
individually-biased  items  acting  in  concert  through  a  test  scoring  method,  such  as  number 
correct,  to  produce  a  biased  test. 

Two  distinct  features  of  this  conceptualization  of  bias  are  as  follows.  First,  it  provides 
a  mechanism  for  explaining  how  several  individually-biased  items  can  combine  through  a 
test  score  to  exhibit  a  coherent  and  major  biasing  influence  at  the  test  level.  In  partic¬ 
ular,  this  can  be  true  even  if  each  individual  item  displays  only  a  minor  amount  of  item 
bias.  For  example,  word  problems  on  a  mathematics  test  that  are  too  dependent  on  so¬ 
phisticated  written  English  comprehension  could  combine  to  produce  pervasive  test  bias 
against  English-as-a-second-language  examinees.  A  second  feature,  possible  because  of  our 
multidimensional  modeling  approach,  is  that  the  underlying  psychological  mechanism  that 
produces  bias  is  addressed.  This  mechanism  lies  in  the  distinction  made  between  the  abil¬ 
ity  the  test  is  intended  to  measure,  called  the  target  ability ,  and  other  abilities  influencing 
test  performance  that  the  test  does  not  intend  to  measure,  called  nuisance  determinants. 
Test  bias  will  be  seen  to  occur  because  of  the  presence  of  nuisance  determinants  possessed 
in  differing  amounts  by  different  examinee  groups.  Through  the  presence  of  these  nuisance 
determinants,  bias  then  is  expressed  in  one  or  more  items. 

The  test  bias  detection  procedure  can  simultaneously  assess  bias  in  several  items, 
thus  addressing  the  above  two  features.  In  contrast,  most  item  bias  procedures  detailed 
in  the  literature  perform  tests  on  a  single  item  at  a  time:  The  pseudo  IRT  procedure 
of  Linn  and  Harnish  (1981)  estimates  possibly  group-dependent  item  response  functions 
(IRFs)  without  the  use  of  item  parameter  estimation  algorithms  when  the  sample  size  is 
too  small  for  their  use.  Thissen,  Steinberg,  and  Wainer  (1988)  employ  marginal  maximum 
likelihood  estimation  to  obtain  group-dependent  item  parameters  in  a  3-parameter  logistic 
framework  and  use  the  likelihood  ratio  test  to  test  the  equality  of  the  parameters  across 
group.  The  Mantel-Haenszel  procedure,  adapted  for  test  response  data  by  Holland  and 
Thayer  (1988),  and  which  is  in  wide  use,  employs  the  practice  of  using  the  score  of  the 
entire  test  instead  of  the  score  of  the  non-studied  items  as  the  “matching  criterion”  to  test 
for  item  bias.  Etc.  Conceivably  these  procedures  could  be  used  once  for  each  item  in  a  set 
of  items  being  tested  for  bias,  and  multiple  comparison  procedures  could  be  employed  to 
assess  the  hypothesis  of  the  entire  set  being  biased.  However,  if  the  amount  of  bias  is  small 


2 


in  each  item,  a  multiple  comparison  procedure  may  not  pick  up  bias  in  the  set  of  items  at 
all.  Moreover  this  approach  cannot  address  underlying  causal  mechanisms  of  bias. 

The  novelty  of  our  approach  to  detecting  test  bias  lies  not  so  much  with  its  recognition 
of  the  role  of  nuisance  determinants  in  the  expression  of  test  bias,  but  rather  in  its  explicit 
use  of  a  multidimensional  model  to  motivate  the  procedure  to  detect  it.  The  presence  of 
multidimensionality  of  test  item  responses  where  bias  is  present  has  long  been  recognized 
in  test  and  item  bias  studies:  Lord  (1980)  states  “if  many  of  the  items  [in  a  test]  are  found 
to  be  seriously  biased,  it  appears  that  the  items  are  not  strictly  unidimensional”  (p.  220). 
Recently,  Lautenschlager  and  Park  (1988)  employed  a  technique  of  generating  simulated 
biased  item  responses  using  a  method  of  Ansley  and  Forsyth  (1985),  which  involves  using 
multidimensional  item  response  functions  (IR-Fs)-and  latent- ability  distributions  to  deter¬ 
mine  conditional  probabilities  of  correct  response.  Kok  (1988),  taking  a  multidimensional 
viewpoint  similar  to  Shealy  and  Stout  (1991),  presents  a  specific  multidimensional  IRT 
model  for  bias  where  the  nuisance  determinants  are  compensating  abilities,  contextual 
abilities  such  as  language,  and  testwiseness. 

An  important  issue  addressed  by  our  procedure  is  that  a  careful  distinction  is  made  be¬ 
tween  genuine  test  bias,  often  operationally  embodied  as  DIF  (Holland  and  Thayer  (1988)) 
by  practitioners,  and  non-bias  differences  in  eximinee  group  performance,  sometimes  called 
impact  (see,  for  example,  Ackerman  (1991)  for  a  careful  discussion  of  impact  as  distinct 
from  bias),  that  are  caused  by  examinee  group  differences  in  target  ability  distributions. 
It  is  important  that  the  latter  not  be  mistakenly  labeled  as  test  bias.  The  procedure 
developed  herein  makes  this  distinction  in  its  application. 


3 


FORMULATION  OF  TEST  BIAS 


Test  bias  in  this  paper  is  modeled  using  a  multidimensional  item  response  theory 
(IRT)  model,  which  is  assumed  to  be  the  model  behind  the  observed  test  responses.  For 
purposes  of  exposition,  we  restrict  ourselves  to  the  case  where  there  is  a  single  nuisance 
determinant;  this  two-dimensional  modeling  approach  is  often  realistic  in  practice.  Exten¬ 
sions  to  multiple  nuisance  determinants  are  straightforward.  For  a  fuller  treatment  of  the 
conception  of  test  bias,  including  the  case  of  multiple  nuisance  determinants  and  item  bias 
cancellation,  in  a  more  general  framework,  see  Shealy  and  Stout  (1991)  and  Shealy  (1989). 

We  consider  two  biologically-  or  sociologically-defined  groups,  named  “reference”  and 
“focal”  groups  (after  Holland  and  Thayer’s  (1988)  naming  convention).  A  random  sample 
of  examinees  is  drawn  from  each  group,  and  a  test  of  N  items  is  administered  to  them. 
Typically  it  is  suspected  that  a  part  of  the  test  is  biased  against  the  focal  group;  this 
group  is  usually  the  object  of  the  bias  study.  The  responses  to  the  test  items  from  a 
randomly-chosen  examinee  are  denoted  £7  =  (U1 , . . .  ,£7^),  where  each  U{  can  take  on 
0  or  1,  according  as  the  response  to  item  i  is  incorrect  or  correct,  respectively. 

The  IRT  model  in  general  is  composed  of  two  components  that  generate  Uj.  (1)  a  d- 
dimensional  examinee  ability  parameter  and  (2)  a  set  of  item  response  functions  (IRFs),  one 
for  each  item,  which  determine  the  probability  of  correct  response  for  the  items.  Here  we 
restrict  the  model  to  have  d  —  1  or  2,  because  we  are  considering  a  single  nuisance  determi¬ 
nant  in  addition  to  the  target  ability.  The  ability  vector  is  (0,  rj)  for  an  arbitrary  examinee 
from  either  group,  where  0  denotes  target  ability  and  rj  denotes  the  nuisance  determinant. 
A  distribution  of  ( 0 , 77)  over  the  combined  group  of  examinees  is  induced  by  choosing  ex¬ 
aminees  at  random;  the  variable  for  a  randomly  chosen  examinee  is  denoted  (0,  rj ).  The 
IRF  for  item  i  is  denoted  Pj(0, 77),  and  it  is  assumed  that  all  items  depend  on  0 ,  and  one 
or  more  may  depend  on  77;  for  those  dependent  only  on  0 ,  the  IRF  is  P{{0).  It  is  implicitly 
assumed  that  an  IRT  representation  for  £7  in  terms  of  (0, 77)  and  { P{(0 , 77) :  i  =  1, . . .  ,N} 
is  possible;  for  a  fuller  treatment  of  this  assumption,  see  Shealy  (1989).  In  addition,  it  is 
assumed  that  each  P,(0,  77)  is  increasing  in  ( 0 , 77)  when  item  i  is  dependent  on  both  abilities 
and  increasing  in  0  when  it  is  dependent  on  0  alone;  and  that  each  Pt(0)  is  differentiable. 
Finally,  local  independence  of  £7  given  (0, 77)  is  assumed. 

Test  bias  in  the  above-mentioned  model  is  formulated  through  three  components: 

(a)  The  potential  for  bias ,  if  it  exists,  resides  within  the  target  ability/nuisance  determi¬ 
nant  distributions  of  the  two  groups  being  studied; 

(b)  potential  for  bias  is  expressed  in  items  whose  responses  depend  on  the  nuisance  de¬ 
terminant;1  and 

1  We  remark  that  Kok’s  (1988)  formulation  is  also  based  upon  (a)  and  (b);  Kok’s  and 
our  formulation  were  developed  independently  of  one  another. 


4 


(c)  the  scoring  rrethod  of  the  test,  to  be  viewed  as  an  estimate  of  target  ability,  transmits 

expressed  item  biases  into  test  bias. 

Potential  for  test  bias  is  explained  prosaically  in  the  following  manner.  After  condi¬ 
tioning  on  a  particular  6t  suppose  that  the  reference  group  has  a  higher  level  of  nuisance 
ability  on  average  than  the  focal  group.  Then  those  reference  group  examinees  with  abil¬ 
ity  0  would  have  an  overall  advantage  over  the  corresponding  focal  group  examinees  when 
responding  to  items  at  least  partially  dependent  on  the  nuisance  determinants  77  (formally, 
because  of  the  monotonicity  of  the  items  IRFs  P,(0,  t?)).  Formally,  we  define  the  potential 
for  test  bias  at  9: 

Definition  1.  Potential  for  test  bias  exists-against-the-foeal  group  at  target  ability  level  8 
with  respect  to  77  if  77  |  0  =  8,  G  =  F  is  stochastically  less  than  77 1  0  =  9,  G  =  R,  where 
“ G  =  F”  denotes  sampling  from  the  focal  group  and  “G  =  R ”  sampling  from  reference 
group.  Potential  for  bias  exists  against  the  reference  group  if  the  converse  holds. 

Note  that  we  are  restricting  consideration  to  conditional  nuisance  distributions  77]©  = 
8,  G  =  R  and  77  |  0  =  9,  G  =  F  that  are  stochastically  ordered;  that  is,  where  the 
two  distribution  functions  do  not  intersect.  Figure  1  displays  two  distributions  that  axe 
stochastically  ordered  and  also  two  distributions  that  are  not. 


place  Figure  1  about  here 


In  order  for  test  bias  to  occur,  it  must  be  expressed  in  one  or  more  items, 
of  expressed  bias  for  an  item,  when  specialized  to  Kok’s  model,  is  really  th 


Our  definition 
e  same  as  that 


5 


of  Kok  (1988,  p.  269).  It  is  defined  in  terms  of  a  marginalization  of  the  multidimensional 

IRFW.i?). 

Definition  2.  Let  P^O^rf)  be  the  IRF  for  item  i.  The  marginal  IRF  for  group  g  (g  =  R 
or  F )  with  respect  to  target  ability  8  is  defined  as 

Ti,w  =  £(Pj(0,i?)  |  0  =  S,G  =  s).  (1) 

When  77  |  6  has  a  conditional  density,  f(r}  |  8)  say,  Definition  2  translates  into 

Tig(6)=  rptwito- 

J —00 

Definition  3.  Expressed  bias  for  item  i  against  the  focal  group  occurs  at  target  ability  8 
if  TiF(8 )  <  TiR{8)\  it  occurs  against  the  reference  group  if  the  converse  holds. 

A  test  can  consist  of  many  items  simultaneously  biased  by  the  same  nuisance  determi¬ 
nant.  In  this  case,  items  can  cohere  and  act  through  the  prescribed  test  score  to  produce 
substantial  bias  against  a  particular  group  even  if  individual  items  display  undetectably 
small  amounts  of  item  bias.  This  is  the  final  (and  novel)  component  of  our  formulation  of 
test  bias  mentioned  above.  We  consider  the  large  class  of  test  scores  of  the  form 

M£0  (2) 

where  h(u)  is  real  valued  with  domain  u  =  ( ux , ...  ,uN)  such  that  u,-  =  0  or  1  for  i  ~ 
1,...  , N  and  h(u )  is  coordinate  wise  non-decreasing  in  u.  This  class  contains  many  of 
the  standard  scoring  procedures  for  many  standard  models;  for  example,  number  correct, 
linear  formula  scoring  of  the  form  Xli=i  ai^v  w^h  a,-  >  0,  maximum  likelihood  estimation 
of  ability  for  certain  logistic  models  with  item  parameters  assumed  known,  etc.  In  this 
paper  we  restrict  attention  to  number  correct  as  the  test  score;  the  results  presented  herein 
are  easily  extendable  to  other  forms  of  h(u).  The  key  point  about  number  correct  scoring 
is  that  each  Item  is  weighted  equally.  Thus,  if  a  subset  of  the  items  is  suspected  of  bias, 
we  should  give  equal  weight  to  the  items  in  this  “studied”  subtest  in  our  attempt  to 
quantitatively  assess  the  amount  of  test  bias  resulting  from  the  simultaneous  influence  of 
thses  items.  We  thus  define  test  bias  for  a  specified  studied  subtest  of  items  as  follows: 

Definition  4.  Let  {IT,-,  ,Uib}  beany  subtest  of  items  to  be  studied  for  bias  from 

the  test  of  concern  and  define 

b 

m = E  u<,  ■  <3> 

i=i 

Then  this  studied  subtest  of  items  displays  test  bias  against  the  focal  group  at  9  if 
E[h(U)  \Q  =  9,G  =  F)<  E[h(U )  \Q  =  9,G  =  R). 


6 


The  subtest  is  biased  against  the  reference  group  if  the  converse  holds. 

Finally,  the  components  of  the  bias  formulation  can  be  integrated  using  the  followir0 
theorem,  adapted  from  Theorem  4.2  in  Shealy  and  Stout  (1991): 

Theorem  1.  Fix  a  target  ability  9  and  choose  the  subtest  scoring  method  h(u )  of  the 
form  (3).  Assume  potential  for  bias  against  the  focal  group  at  9  holaa  ^Definition  1).  Then 
test  bias  exists  against  the  focal  group;  i.e., 

6  b 

£ rn,  |  e  - g  -  j]  <  £  spr,,  |  e  = «,  g  -  *].  « 

;=i  i= i 

In  order  to  test  for  bias  of  the  above  form,  there  must  be  an  implicit  assumption  that  a 
portion  of  the  test  measures  only  the  target  ability; -otherwise;  a  conditional-on-observed 
score  procedure  to  detect  bias  is  not  possible.  This  set  of  items  will  be  denoted  the  valid 
subtest.  The  issue  of  the  existence  and  identification  of  a  valid  subtest  is  extremely  difficult 
to  frame  philosophically  (it  is  really  an  issue  of  construct  validity)  and  must  primarily  be 
an  empirical  decision  based  on  expert  opinion  or  data  at  least  in  part  external  to  the  test 
being  studied;  it  is  not  dealt  with  here.  For  a  fuller  discussion,  see  Shealy  and  Stout  (1991). 
For  notational  simplicity  we  denote  the  valid  subtest  to  consist  of  first  n  <  N  items  of 
the  test,  and  we  call  the  remainder  of  the  N  —  n  items  the  studied  subtest.  We  note  that 
use  of  a  valid  subtest  is  operationally  equivalent  to  making  use  of  a  subset  of  items  whose 
purpose  is  to  partition  examinees  into  “comparable”  sets  as  is  done  in  the  MH  procedure 
described  below  and  other  DIF  procedures.  Hence,  the  proposed  use  of  a  valid  subtest  in 
the  SIB  procedure  can  be  interpreted  either  in  the  strong  sense  of  our  test  bias  paradigm 
or  in  the  weak  sense  of  the  DIF  paradigm  (of  matching  of  “comparable”  examinees).  Thus 
use  of  our  statistical  procedure  for  assessing  bias  in  no  way  requires  acceptance  of  our  bias 
framework  as  opposed  to  a  “comparability”  framework,  where  no  claims  about  “bias”  are 
made. 

Using  the  above  conventions,  the  specification  of  test  bias  against  the  focal  group  at 
9  becomes  ^ 

TfW  =  £  TiF(6)  <  £  riR(0)  =  rR(«)  (5) 

i'=n+l  i=n+l 

because  Tig(9 )  =  E[Ui  |  0  =  9,  G  =  <7]  by  a  simple  application  of  a  standard  conditioning 
formula  to  Definition  2.  Tg{9)  is  called  the  studied  subtest  response  function  for.  group  g. 

Unidirectional  test  bias 

Test  bias  heretofore  has  been  considered  conditional  on  a  single  target  ability;  we  now 
turn  to  a  global  perspective.  If  there  is  test  bias  against  the  same  group  for  all  9 ,  then 
there  is  unidirectional  bias  against  this  group.  Specifically,  if 

B(9)  =  TR{9 )  -  Tf(9) 


7 


is  the  level  of  bias  against  Group  F  at  0,  then  unidirectional  bias  holds  if  either  B{9)  >  0 
for  all  9  or  B{9)  <  0  for  all  9.  A  strong  form  of  unidirectional  bias,  termed  uniform 
bias  by  Mellenbergh  (19S2),  is  the  iype  of  bias  that  the  modified  Mantel-Haenszel  test 
statistic  devised  by  Holland  and  Thayer  (1988)  is  designed  to  detect.  Although  the  Mantel- 
Haenszel  approach  is  not  dependent  on  an  IRT  framework,  it  can  be  put  in  a  Rasch 
model  IRT  framework,  with  the  single  biased  item  having  group-dependent  item  difficulties. 
Here,  the  bias  is  “uniform”  in  the  sense  that  TF(9)  is  merely  TR{9 )  shifted  horizontally. 
Unidirectional  bias  is  less  restrictive  in  that  Tg{9 )  does  not  have  to  be  a  logistic  IRF,  and 
more  importantly,  TR(9 )  does  not  have  to  be  TF{9 )  shifted. 

Since  we  axe  concerned  with  bias  against  the  focal  group,  it  is  intuitive  that  a  suitable 
theoretical  unidirectional  bias  index  is 

where  fF{9 )  is  the  probability  density  function  of  0  for  the  focal  group.  Equivalent  in¬ 
dices  weighted  by  the  reference  target  ability  distribution  and  the  combined-group  target 
distribution  are  easily  conceptualized. 

THE  BASIC  PROCEDURE 

The  statistical  procedure  to  be  presented  is  based  on  (6);  the  hypothesis  is 

H  :  /3fj  =  0  vs.  fa j  >  0, 

the  alternative  being  one-sided  to  specifically  test  for  bias  against  the  focal  group.  The 
test  statistic  to  be  constructed  is  essentially  an  estimate  of  Pv  normalized  to  have  unit 
variance.  The  estimate  of  fiy  is  derived  first. 

Since  test  bias  is  analyzed  using  number  correct  on  the  studied  subtest,  set 

y-  E  u>  P) 

i=n+l 

to  be  the  studied  subtest  score;  also  set  X  =  Uj  to  be  the  valid  subtest  score.  In 
selecting  the  valid  subtest  score  to  be  number  correct,  we  follow  the  convention  set  out  in 
Holland  and  Thayer  (1988),  among  many  others.  Other  choices  would  of  course  be  possible 
and  could  improve  the  performance  of  the  procedure. 

The  naive  intuition  is  that  examinees  with  the  same  valid  subtest  score  are  examinees 
of  approximately  equal  target  ability  and  thus  such  examinees  are  directly  comparable  in 
the  assessment  of  bias.  Thus  the  difference 

YRk  —  YFki  &  =  0,  ...,n,  (8) 


8 


where  Ygk  is  the  average  Y  for  all  examinees  in  group  g  attaining  valid  subtest  score  X  =  fc, 
should  provide  a  measure  of  the  bias  against  the  focal  group  (resulting  from  the  reference 
group  having  superior  nuisance  ability  77  on  average).  In  particular,  if  there  is  no  bias  ( H 
holds),  then  Yric  ~  Y Fk  —  0  for  ail  k  should  be  observed,  and  if  there  is  unidirectional 
bias  against  the  focal  group  ( B(9 )  >  0  for  all  9)  then  Y. Rk  ~  Y Fk  >  0  for  all  k,  except  for 
statistical  error,  should  be  observed. 

The  above  assertion  needs  support;  it  will  suffice  to  argue  that 

E[YRk  —  Tp*]  =  0  for  all  k  if  B(9)  =  0  for  all  9 ,  and 

E[YRk  -  yFfc]  >  0  for  all  k  if  B{9)  >  0  for  all  9 ..  ^ 

For  now  we  restrict  the  target  ability  distributions  to  be  equal  for  the  two  groups;  i.e., 

0  |  G  =  R  and  0  |  G  =  F  have  the  same  distribution.  It  is  easy  to  prove  (following  (5)) 

under  the  model  presented  herein  that 

£[?,»]  =  E\Y\X  =  fc,G  =  s]  =  £[T,(e) \X  =  k,G  =  g).  (10) 

Now  assume  that  the  valid  subtest  is  long  enough  so  that  the  distribution  of  0  |  X  =  k, 
G  —  g  is  tightly  concentrated  about  its  mean,  and  hence  that  Tg{9)  is  locally  flat  within 
the  range  of  9  where  the  distribution  of  0  |  X  =  k,  G  =  g  mostly  resides.  Then 

B[r,(8)  I X  =  k, G  =  5]  >  Ts(£[0  \X  =  k,G  =  g])  (11) 

=  rs(£[0  |  A'  =  *]), 

because  the  two  target  ability  distributions  are  equal  and  expectation  is  a  linear  operator. 
Thus,  denoting  9k  =  jS[0  |  X  =  k], 

E\YRk  -  ?„]  =  B(9k).  (12) 

Thus  (9)  follows  easily;  the  n  +  1  differences  in  (8)  provide  an  estimate  of  B{9)  at  n  +  1 
points  in  the  0-domain.  It  is  intuitive  that  an  estimate  of  is- 

n 

Pu  -  YMYRk  -  Ypk)  (!3) 

k=  o 

where  pk  is  the  proportion  (among  focal  group  examinees)  attaining  X  =  k.  Specifically, 
if  Jgk  is  the  number  of  examinees  in  group  g  attaining  X  =  k,  then  pk  =  JFk/J2k=o  / Fk- 
In  the  case  where  the  target  ability  distributions  are  the  same,  then,  it  is  straightfor¬ 
ward  that 

sfc] = = A/  (i4) 

k=0 


9 


where  pk  =  P[X  =  k  |  G  =  F],  Thus  the  expected  value  of  Py  is  a  weighted  difference 
of  marginal  IRFs,  this  weighted  difference  approximating  py,  which  is  a  continuously 
weighted  difference  of  marginal  IRFs.  From  (14),  it  follows  that  E(3y  =  0  if  Py  =  0,  and 
EPy  >  0  if  Py  >  0.  This  suggests  the  standardized  test  statistic 

n  _  Pu 

*tfv) 

for  testing  if,  where  the  denominator  is  defined  as 

U-o  \JRt  Jn  J) 

where  d2(F  |  k,g)  is  the  sample  variance  of  the  studied  subtest  scores  of  those  group  g 
examinees  with  valid  subtest  score  k.  A  full  description  of  the  computation  of  the  test 
statistic,  with  contingencies  for  exclusion  of  certain  valid  subtest  scores  based  on  inadequate 
examinee  counts,  is  presented  in  the  Appendix.  B  approximately  standard  normal  when 
Py  =  0  and  the  target  ability  distributions  are  the  same,  because  Py  is  the  weighted  sum 
of  approximately  normal  random  variables  YRk-YFk>  these  are  approximately  normal  (for 
suitable  sample  sizes)  by  the  central  limit  theorem  (proof  of  asymptotic  normality  of  B 
omitted). 

The  regression  correction  for  target  ability  difference 

The  presence  of  a  difference  in  target  ability  distributions  in  test  bias  studies  has  been 
treated  in  various  contexts  in  the  literature.  The  issue  of  the  linking  of  metrics  across  group 
in  the  estimation  of  IRT  item  parameters  is  one  such  context  (see  Linn,  et  al  (1981)  for  an 
IRT  item  bias  approach  where  linking  of  metrics  is  crucial).  Holland  and  Thayer  (1988) 
also  deal  with  this  problem  by  including  the  single  studied  item  in  the  matching  criterion 
score  of  the  Mantel-Haenszel  test;  they  prove  that  this  method  completely  compensates 
for  target  ability  difference  (in  their  context,  the  distributional  difference  in  the  postulated 
unidimensional  latent  trait)  when  the  underlying  IRT  model  is  a  Rasch  model.  Millsap 
and  Meredith  (1989)  elegantly  formulate  the  problem  in  terms  of  a  divergence  of  two 
hypotheses  (a  “conditional  on  observed  score”  hypothesis  and  a  “latent  trait”  hypothesis), 
which  would  occur  if  target  ability  difference  is  present.  A  “conditional  on  observed  score” 
procedure  such  as  (15)  in  its  present  form  is  not  adequate  to  address  the  separation  of 
target  ability  difference  from  test  bias;  the  presence  of  target  ability  difference  when  in 
fact  there  is  no  test  bias  present  can  statistically  inflate  B ,  thereby  suggesting  test  bias 
actually  is  present.  It  is  therefore  necessary  to  formulate  a  correction  for  target  ability 
difference. 


(15) 


10 


To  motivate  the  proposed  correction  it  is  necessary  to  show  that  a  decomposition  of  the 
differences  ?Rk  ~  Y. Fk  into  “test  bias  only”  and  “target  ability  difference  only”  components 
is  possible.  First  we  note  that  by  similar  arguments  to  those  used  in  deriving  (10)  and  (11), 

£i*y  =  r, («,,),  (I?) 

where  9gk  =  E[Q  |  h,g].  The  condition  E[YRk  —  FFfc]  =  0  requires  9Rk  =  dFk ,  as  in  (11) 
where  g  was  removed  from  the  conditioning;  but  this  may  not  happen  if  the  target  ability 
distributions  are  not  the  same,  as  Figure  2  suggests.  Figure  2,  which  displays  densities 
for  four  distributions,  assumes  that  the  distribution  of  0  |  F  is  stochastically  smaller  than 
that  of  0  |  R. 


place  figure  2  about  here 


Note  that  the  (conditional)  distribution  of  0  |  fc,  F  is  stochastically  smaller  than  that 
of  0  |  k,R  for  all  k.  The  standard  Bayesian  calculation  makes  this  insight  rigorous.  Thus, 
0Fk  <  dRk  for  all  k ,  and,  in  the  absence  of  bias,  where  TR{9 )  =  TF(6 )  =  T{9)  for  all  6, 

EYn  =  T(eFk)  <  T(6Rk )  =  EYRk 

( T{6 )  is  assumed  monotone;  for  mild  conditions  giving  such  monotonicity,  see  Shealy  and 
Stout  (1991)).  Thus 

n 

Eh  =  '£pkmeRk)-T(eFt))>  o. 

fc=0. 

In  the  case  where  bias  is  present,  we  can  thus  decompose  E[^v}: 

£[/%]  =  y ^PkC^R^Rk)  ~  TF{9rI:))  +  “  Tp(8Fk)) 

k=o  ^  k=°  (18) 

=  'Y^Pk^i^Rk)  +  Y2PkT'F(Qk)(.0Rk  ~  &Fk)> 

0  k=0 

where  6k  is  between  dRk  and  9Fk.  ( TF(9 )  is  assumed  differentiable  here  and  the  mean 
value  theorem  has  been  applied.)  The  first  term  is  due  only  to  test  bias;  the  second  is  due 
only  to  target  ability  difference. 


11 


This  approximate  decomposition  argument  is  the  motivation  behind  the  proposed 
correction.  Our  strategy  is  to  adjust  Ypk  to  Yh  such  that  the  inflating  effect  of 
the  group  differences  in  target  ability  is  eliminated.  The  manner  this  is  accomplished  is  to 
construct  Yru  and  YFk  so  that  they  are  estimating  the  studied  subtest  response  functions 
TR{9)  and  TF{9)  at  approximately  the  same  target  ability  9k  defined  below  (as  opposed 
to  two  different  ones,  as  is  evident  from  (17)). 

A  natural  attempt  to  make  adjustments  to  Yru  and  YFk  is  to  approximate  TR{9)  and 
Tp{9 )  in  the  neighborhood  of  9Rk  and  9Fk  by  linear  functions.  If  we  assume  that  9Rk  and 
9Fk  are  sufficiently  close  together  to  do  this,  TR(9 )  and  TF(9)  can  be  linearly  interpolated 

at  @k  —  2  +  ^FJt): 


Tg(9k)  =  Tg(9gk)  +  mgk(6k-9gk)  (19) 

where 


m  _  Tg(Qg,k+ 1)  Yg{9g  k_ j) 

S‘  «,,*+!  -  «,,»->  ’ 

however,  though  estimates  of  Tg{9gk)  (namely,  Ygk)  are  available  for  all  fc,  estimates  for 
{9gk  :  k  =  0, . . .  ,n}  are  not.  Abilities  on  the  0-scale  are  not  observable;  however,  one  can 
estimate  abilities  on  the  scale  defined  by  the  valid  subtest ,  namely 


u  =  P{9) 


where  P(9)  is  the  average  of  the  valid  subtest  IRFs  £  -P,-(0).  P(0)  |  G  =  g  is  the 

true  score  for  a  randomly  chosen  group  g  examinee,  i.e.,  the  valid  subtest  true  score  P(0) 
for  group  g.  Let 


V's(I)  =  £:(P(0)|X  =  a:,G  =  9],  (20) 

the  (theoretical)  regresion  of  true  on  observed  (here,  valid)  score.  Fs(a:)  can  be  easily 
estimated  using  classical  true  score  theory,  assuming  that  the  above  regression  is  linear  or 
nearly  so.  The  estimation  of  Vg(x)  is  deferred  to  the  appendix.  Denote  this  estimator  by 

fy*)- 

At  this  point  it  is  expedient  to  describe  three  latent  scales,  which  must  be  simulta¬ 
neously  considered  in  order  to  understand  the  correction.  Figure  3  delineates  the  three 
scales  and  should  be  referred  to  frequently. 


12 


place  figure  3  about  here 


So,  the  interpolation  of  (19)  must  be  transformed  so  as  to  use  the  easily  estimable 
Vg(k )  instead  of  9gk.  Through  a  monotonic  transformation  P(9),  Vg(k )  and  6gk  represent 
approximately  (“approximately”  because  P(9g k)  =  Vg{k )  will  be  demonstrated  below) 
the  same  ability  on  two  different  latent  scales  and  thus  for  our  purposes  interchangeable. 
Note  that  s  =  Tg(9)  defines  a  monotonic  transformation  from  the  fundamental  latent 
scale  to  the  studied  subtest  scale,  and  v  =  P{9 )  defines  one  from  the  fundamental  scale 
to  the  valid  subtest  scale.  Tg{9 )  must  be  transformed  so  we  can  use  the  valid  subtest 
scale  as  domain,  because  abilities  on  this  scale  can  be  estimated.  Figure  4  illustrates  the 
appropriate  correspondence, 


place  figure  4  about  here 


thus  defining  a  new  transformation  Sg(v)  =  Ts(P_1(v))  from  valid  subtest  scale  to  studied 
subtest  scale,  with  domain  (c,  1)  and  range  (c,  1)  (c  >  0  is  the  guessing  parameter,  assumed 
common  for  all  items  in  the  test). 

With  this  transformation  in  hand,  the  correction  can  be  performed  in  the  following 
manner.  First,  by  the  same  arguments  as  used  in  (10)  and  (11),  using  P{9)  in  place  of 
Tg(9)  in  the  arugments, 

W  =  Wit,,|).p(y  (21) 

So  P~1(Vg(k))  =  9gk  by  continuity;  and 


13 


also  by  continuity.  By  definition  of  Sg(v ),  this  becomes  Sg(Vg(k))  =  Tg(9gk),  and  thus 
by  (17), 

EYgk  =  Sg(Vg(k)).  (22) 

Thus  Ygk  is  a  reasonable  estimation  of  Sg(Vg(k))  for  each  k.  To  transform  (19)  into 
an  interpolation  involving  Sg(-),  we  assume  that  Sg(v )  can  be  approximated  by  a  linear 
function  in  a  small  region  about  Vg(k),  and  that  VR(k )  and  VF(k )  are  close  enough  to 
allow  the  approximation  to  be  effective.  Then,  we  interpolate  SR(VR(k))  and  SF(VF(k)) 
to  their  respective  values  at  Vk  =  \{VR(k)  +  Vp(k)): 


Sg(Vk)  =  Sg(Vg(k))  +  m;k(Vk  -  Vg(k)\  (23) 


where 

.  S(Vg(k  +  l))-Sg(Vg(k-l)) 
vg(k  +  l)-Vg(k-l) 

is  the  approximate  slope  of  Sg(v)  in  the  region  of  VJk)  and  Vk.  All  of  the  above  terms  on 
the  right  hand  side  of  (23)  are  estimable;  using  Ygk  to  estimate  Sg(Vg(k)),  we  define  the 
adjusted  nv 

Y,\  =  Y,t  +  M,k(Vk  -  Vs(k))  (24) 

where,  recalling  that  the  estimator  Vg(x)  is  given  in  the  Appendix, 


Vg(k  +  1)  -  Vg(k  -  1) 


and  define  Vk  =  $(VR(k)  +  Vp(^)).  Because  the  right  hand  side  of  equation  (24)  is  a  good 
estimator  of  the  right  hand  side  of  (23),  Ygk  is  thus  a  good  estimator  of  Sg(Vk).  Finally, 
must  be  shown  to  be  a  good  estimator  of  Tg(8)  at  the  same  9  for  both  groups.  By  definition 
of  ■S'j(v),  ^(V’jt)  =  Tfl(P_3(14)).  If  9Rk  and  9Fk  are  sufficiently  close  together  then  P(9) 
may  be  taken  to  be  approximately  linear  in  the  neighborhood  of  9k  =  (8Rk  +  8Fk)/2.  Thus, 
using  (21)  and  assuming  approximate  linearity  of  P  in  the  neighborhood  of  9k , 


Vk  =  \(VR(k)  +  VF(k )) 

=  n(P(0Rk)  +  P(®Fk)) 
=  P{0k). 


Thus,  by  the  continuity  of  P(9 ), 


9k=P~\Vk). 


14 


Hence,  by  the  definition  of  Sg(v ) 

Ss(Vi)=Ts(p-1(Vk))  =  T,(et). 

Thus,  because  Ygk  has  been  shown  to  be  a  good  estimator  of  Sg(Vk),  it  is  shown  that 
Ygk  is  a  good  estimator  of  Tg(9k).  Thus,  YRk  —  Ypk,  as  desired,  is  a  good  estimator  of 
TR(9k )  —  TF(9k),  i.e.,  of  the  difference  of  the  marginal  IRFs  at  the  same  9 ,  establishing 
the  usefulness  of  the  interpolation  (19). 

(24)  is  called  the  regression  correction  for  target  ability  difference.  Thus,  with  the 
correction  (24)  in  place,  (13)  can  be  reconstructed,  with 

(25) 

k= 0 

and  B  defined  as  in  (15).  Rejection  of  the  hypothesis  of  no  test  bias  (H  :  =  0)  occurs 

when  B  >  za ,  where  P[Ar(0, 1)  >  za)  =  a  defines  zQ.  This  procedure  will  be  referred  to 
as  the  SIB  procedure,  “SIB”  for  simultaneous  item  bias. 

Thus,  the  contribution  to  the  differences  yRk  -  yFk  due  to  target  ability  difference 
has  been  eliminated.  It  is  extremely  instructive  to  note  that  the  correction  (24)  is  the 
sample  analogue  of  (23),  which  is  basically  the  decomposition  (19),  albeit  on  a  different 
latent  scale  (though  the  two  latent  scales,  9  and  V,  are  indistinguishable  up  to  a  monotonic 
tranformation). 

A  modification  of  the  basic  procedure  to  achieve  better  statistical  behavior 

Redefine  pk  to  be  the  proportion  of  all  examinees  (focal  and  reference  group)  attaining 
X  =  k.  That  is  pk  =  ( JFk  +  JRk)/  J2k=o(^Fk  +  J Rk )•  Substitute  this  new  pk  into  (25) 
and  (16)  to  obtain  the  statistic  B  of  (15).  Because  of  a  slightly  better  adherence  in 
simulation  studies  to  the  nominal  level  of  significance  when  the  hypothesis  of  no  test  bias 
holds,  this  new  choice  of  pk  is  recommended  over  the  slightly  more  intuitive  choice  based 
upon  focal  group  examinees  alone.  The  power  performance  of  both  versions  of  B  when 
test  bias  was  present  was  very  similar.  It  is  upon  this  version  of  the  SIB  statistic  that  our 
simulation  studies  reported  below  are  based. 

SIMULATION  STUDY 

In  order  to  assess  the  performance  of  the  procedure  in  a  variety  of  testing  situations, 
a  moderate-sized  (84  simulation  cases)  simulation  study  was  performed.  Three  parameter 
logistic  item  parameters  actually  estimated  from  two  test  data  sets,  an  ACT  math  test 
(estimated  by  Drasgow  (1987))  and  an  ASVAB  auto  shop  test  (estimated  by  Mislevy  and 
Bock  (1984)),  are  used  to  specify  the  IRFs  in  the  IRT  model.  Univariate  and  bivariate 


15 


normal  ability  distributions,  appropriately  centered  relative  to  the  test  item  parameters 
(for  the  purpose  of  good  measurability  of  target  ability),  are  used  for  the  focal  and  reference 
groups.  Two  levels  of  bias  and  three  levels  of  target  ability  difference  axe  simulated;  tests 
with  a  singly-based  item  and  with  three  biased  items  are  used  in  the  simulations.  The  level 
of  guessing  in  the  tests  is  varied.  Finally,  group  size  pairs  of  (3000,3000),  (3000,1000), 
and  (1500, 1500)  for  the  reference  group  and  focal  group  examinees  respectively  are  used. 

Each  simulation  model  is  run  100  times  (trials).  For  a  particular  simulation  model,  the 
item  parameters  and  the  two  ability  distributions  for  the  two  groups  are  fixed;  however, 
at  each  trial,  a  new  set  of  examinees  (ability  parameters)  is  generated  from  the  ability 
distributions. 

When  a  single  item  is  to  be  studied  in  a  simulation, .the  Mantel-Haenszel  procedure  as 
modified  by  Holland  and  Thayer  is  run  in  parallel  in  order  to  provide  an  external  reference 
to  compare  to  and  to  compare  our  procedure  with. 


Item  parameters 


Estimated  item  parameters  from  the  above  mentioned  tests  were  used  to  construct  test 
models;  the  ASVAB  test  length  is  25,  and  the  ACT  test  length  is  40.  Table  1  gives  the  sum¬ 
mary  statistics  for  the  a’s,  b’s,  and  c’s  as  estimated  by  Mislevy  and  Bock  and  by  Drasgow; 
for  the  actual  parameter  values,  see  Mislevy  and  Bock  (1984)  and  Drasgow  (1987). 


place  table  1  here 


The  test  for  each  simulation  was  generated  in  the  following  manner.  Let  N  denote 
test  length  and  nb  the  number  of  items  to  be  studied  for  possible  bias.  First,  nb  was  chosen 
to  be  either  1  or  3.  There  were  two  cases  to  consider. 

1.  No  bias:  unidimensional  items  are  used  for  the  entire  test. 

2.  Bias:  unidimensional  items  are  used  in  the  valid  subtest,  and  2-dimensional  items  are 
used  in  the  studied  subtest. 


16 


place  table  2  about  here 


In  the  first  case,  nb  of  the  N  items  were  chosen  randomly  to  be  the  studied  ones,  and 
the  remainder  were  used  as  the  valid  subtest.  In  the  second  case,  n  =  N  —  nb  items  were 
chosen  at  random  from  either  the  ASVAB  or  the  ACT  test  to  be  the  valid  subtest,  and 
the  2-dimensional  studied  item  parameters  were  chosen  according  to  Table  2.  Note  that 
the  studied  item  guessing  parameters  are  a  function  of  the  average  and  standard  deviation 
of  the  guessing  parameters  on  the  ASVAB  or  A'CTTe'st's;  the  studied  item  a’s  and  b’s  axe 
the  same  for  both  tests. 

The  IRFs  axe  for  case  1  (no  bias) 


0-  “  ci) 


pt(d)  ci  +  j  +  eXp(-1.7at.e(0  -  bi6)) 


(26) 


where  aig  and  big  are  the  target  discrimination  and  difficulty  for  item  i.  In  case  2  (bias), 
items  1  to  n  were  of  the  form  (26),  and  items  n  +  1  to  N  (studied  items)  had  IRFs 


l)  =  ci  + 


_ (izii) _ 

1  +  exp(— 1.7(ai$(6  -  big)  -1-  a,„(0  -  6,-,))) 


i  =  n  +  1, . . .  ,  JV.  (27) 


The  final  factor  in  determining  the  item  parameters  was  whether  or  not  to  include  guessing; 
that  is,  whether  to  assume  2PL  or  3PL  modeling.  The  presence  of  guessing  is  thought 
to  influence  the  performance  of  the  procedure.  Thus,  in  some  simulation  models,  the 
estimated  ct-’s  from  the  literature  were  used  in  conjunction  with  (26)  and  (27);  in  others, 
all  c,’s  were  set  to  0  producing  a  2PL  model.  A  detailed  description  of  the  experimental 
design  of  the  simulations  follows. 


Ability  distributions 

Specifying  the  ability  distributions  involves  choosing  the  five  parameters  determining 
the  bivariate  normal  distributions  for  each  group  in  such  a  way  to  meet  the  following  goals: 

1.  Introduce  a  specified  amount  of  group  difference  between  target  ability  distributions. 

2.  Require  the  test  to  measure  the  target  ability  well,  as  would  be  true  for  any  “good” 
test. 

3.  Introduce  a  specified  amount  of  potential  for  bias  into  the  distributions. 

4.  In  the  case  of  2-dimensional  studied  items  (bias  case),  require  that  examinee  nuisance 
abilities  be  influential  in  determining  the  response  to  the  item,  e.g.,  that  target  and 
reference  group  examinees  have  moderate  nuisance  abilities. 


17 


Each  goal  is  elaborated  upon  separately  below.  The  bivariate  distributions  for  group  g 
(g  =  R  or  F )  is  denoted 


0|<A 
v\  9  J 


~N 


1  P 

’U  iJ 


(28) 


where  p  =  Corr(0,77  |  G  =  g)  is  taken  to  be  the  same  for  both  groups  ( p  taken  to  be 
different  across  group  tends  to  introduce  bidirectional  bias,  where  marginal  IRFs  in  9  for 
the  two  groups  cross;  see  Shealy  (1989)).  Note  that  a2(0  |  g)  and  a2(r]  |  g)  are  taken  to 
be  1  in  our  study. 

Goal  1.  We  first  define  target  ability  difference.  We  need  some  notation;  let  aR  — 
the  proportion  of  the  entire  (conceptual)  population  of  examinees  who  are  referece  group 
members,  and  aF  =  1  —  aR  be  the  corresponding  proportion  for  the  focal  group.  (Note: 
as  JR  and  JF  both  increase  to  oo,  conceptually,  — *  &r  and  —*  <*F.  Here  Jg 

denotes  the  number  of  sampled  Group  g  examinees.)  Define 


^  _  PeR  ~  Pbf 
T  a6P 

to  be  the  target  ability  difference  between  the  focal  and  reference  groups,  where 


(29) 


°]p  =  £*r0-2(0  |  R)  +  aFa2(0  |  F). 


(30) 


Note  that  when  (28)  holds  a26p  =  1  and  thus  that  dT  =  peR  —  peF.  dT  is  a  quantity 
specified  in  the  simulations. 

Goal  2.  The  criterion  used  to  ensure  good  measurability  of  6  by  the  test,  is  that  the 
average  difficulty  ( b )  of  the  valid  subtest  should  be  close  to  the  average  target  ability  over 
the  pooled  groups.  Specifically,  peR  and  peF  are  chosen  so  that 

b  =  E[Q]  =  aRp,eR  +  aFpeF.  (31) 


b  is  taken  from  Table  1.  p6R  and  peF  are  completely  determined  by  specification  of  dT 
and  (31). 

Goal  3.  We  use  a  more  restrictive  version  of  Definition  1  to  define  potential  for  bias:  set 


Cfi(0)  =  E[n  i  0  =  9,G  =  R]  -  JE7[t7  I  0  =  0,G  =  F].  (32) 

Cp(8)  >  0  is  defined  to  be  the  potential  for  bias  against  the  focal  group.  When  (28)  holds, 
(32)  becomes 

Cp{0)  =  Cp  =  pvR  -  ppeR  ~  (pnF  -  pneF) 

=  (Pt)R  -  Pt)f)  ~  PiPeR  -  Pof)  =  (Pr)R  “  Pvf)  ~  P&T-> 


18 


0  dropping  out  because  the  ability  correlation  ( p )  is  equal  for  both  groups.  Note  that 
because  Cp  is  constant  for  all  0,  unidirectional  bias  is  being  introduced.  For  a  specified 
amount  of  Cp,  p^R  and  p^F  are  determined  partially.  The  reader  should  note  that  potential 
for  bias  can  hold  even  though  nvR  —  pnF  unless  p.eF  =  peR. 

Goal  4.  The  criterion  used  to  ensure  nuisance  determinant  influence  is  the  following.  The 
nuisance  difficulties  for  ail  studied  items  were  chosen  to  be  0.  For  an  arbitrarily  chosen 
target  ability  (say  0  =  0)  we  thus  want  the  average  nuisance  ability  to  be  near  0  as  well. 
Thus  we  choose 

E[r}\Q  =  0,G  =  R}  =  —E[tj  |  0  =  0,  G  =  F]  (34) 

i.e.,  the  conditional  nuisance  expectation. at  Q-  =  0  is..to. be. centered  around  the  average 
studied  item  nuisance  difficulty  of  0,  for  the  reference  and  focal  groups.  Our  intent  in  this 
study  was  to  introduce  bias  against  the  focal  group,  so  E[tj  \  0,  i?]  >  0  in  (34)  and  thus  we 
get 

d  <  Pr)R  ~  PPeR  =  ~(PijF  ~  PPof)\  (35) 

this  will  specify  p.vR  and  p^p,  along  with  specification  of  Cp  in  (33). 

There  is  an  additional  issue  here:  how  large  should  Cp  be  chosen  to  introduce  a 
“moderate”  or  “severe”  amount  of  bias  into  the  2-dimensonal  studied  items  of  Table  2? 
This  is  treated  below,  in  the  experimental  design  of  the  study. 

Goals  1-4  now  completely  specify  (28):  p6R,  p9F,  pnR,  and  pvF  can  be  found  by 
olving  (29),  (31),  (33),  and  (35)  simultaneously  for  them,  p,  a2(0  |  g),  and  a2{r]  |  g )  are 
chosen:  p  =  .5,  and  all  er’s  are  1. 

Choice  of  Cp 

The  amount  of  potential  for  bias  Cp  in  each  simulation  model  was  chosen  so  that  the 
actual  level  of  bias  /Jy  produced  was  such  that  the  power  behavior  of  the  statistic  can  be 
well  assessed  for  the  given  examinee  sample  sizes,  valid  subtest  use'4  (recall  Table  1),  and 
biased  items  used  (recall  Table  2).  These  j3v  values  (rounded  to  two  significant  figures) 
are  shown  in  Table  3.  The  governing  equations  determining  Cp  from  f3v  were 

h  =  f( TM  - 

where 

N 

r,(«)=  £  B[p,.(0,ij)|e  =  e>c?  =  s]  (36) 

i=n+l 

with  Pi(0,r) )  defined  in  (27)  and  the  item  parameters  in  (27)  defined  in  Table  2,  and  the 


19 


place  table  3  about  here 


parameters  of  the  ( Q,ij )  distribution  determined  from  (29),  (31),  (33),  and  (35).  One 
standard  often  used  to  interpret  from  a  practitioner’s  viewpoint  the  magnitude  of  the  bias 
is  that  the  bias  is  “moderate”  if  0.5  <  A MH  <  1  while  it  is  “large”  if  >  1,  where 
Aa/h  the  theoretical  index  based  on  use  of  the  Mantel-Haenszel  log  odds  ratio  proposed 
by  Holland  and  Thayer  (19S8).  The  rationale  for  H  and  Pv  are  different,  but  for  nb  =  1 
and  unidirectional  bias,  they  tend  to  be  highly ‘correlated  and  are  crudely  related  by 


Pu  =  Amh/10. 


Thus,  roughly,  0.05  <  Py  <  0-1  would  constitute  moderate  bias  while  Py  >  0.1  would 
constitute  large  bias.  Thus  in  the  nb  =  1  case,  referring  to  Table  4,  the  amount  of  bias 
being  simulated  is  actually  either  (low)  moderate  or  small.  Examination  of  (36)  shows  that 
Py  is  a  measure  of  how  much  lower  the  probability  of  getting  the  biased  item  right  is  for 
an  average  focal  group  examinee  as  compared  with  an  average  reference  group  examinee 
of  the  same  target  ability.  Thus  Py  has  a  natural  and  useful  empirical  interpretation.  In 
our  context,  by  contrast,  is  a  measure  of  horizontal  distance  between  TR{8)  and 

Tf(6)  at  y  =  ^  (i.e.,  the  value  of  T^1((l  +  c)j 2)  —  T^'1((l  +  c)/2)),  where  c  is  defined 
in  Table  1. 


place  table  4  about  here 


Experimental  design 

The  design  is  as  follows.  For  the  case  of  no  test  bias  (Cp  =  0),  for  each  test  type 


20 


(ASVAB  Auto  Shop  or  ACT  Math)  the  following  simulations  are  done: 


nb  = 


0.0 

0.5 


10 


D 


(  guessing 
\  no  guessing 


x  Jr/Jf  — 


'  3000/3000  ) 
<  3000/1000  \ 
1500/1500  J 


Here  “guessing”  means  that  the  estimated  ACT  and  ASVAB  guessing  parameters  are  used 
in  the  model  and  “no  guessing”  means  that  all  cs  are  set  to  zero;  that  is,  2PL  modeling 
is  used.  Also,  “D”  means  that  this  guessing  “factor”  is  randomly  assigned  within  the 
36  levels  produced  by  crossing  the  other  factors. 

For  the  case  of  test  bias  (Cp  >  3)  the  following  simulation  are  done  for  each  test  type: 


D  f  guessing  1 
(  no  guessing  J  ' 


3000/3000  ) 
3000/1000  \ 
1500/1500  J 


For  nb  =  1,  the  nuisance  discrimination  aNr]  of  the  studied  item  is  .8;  for  nb  =  3,  the 
nuisance  discrimination  of  each  of  the  3  studied  items  is  .4.  These  discriminations  were 
chosen  so  that  the  power  of  the  procedure  could  be  well  assessed  (i.e.,  so  that  it  would  not 
be  too  close  to  1).  It  is  informative  to  note  in  passing  that  the  power  of  the  procedure 
is  expected  to  be  greater  when  nb  is  increased  from  1  to  3  unless  each  item  individually 
displays  less  bias  in  the  nb  =  3  case.  This  is  why  the  a{r)  (i  =  N  —  2,  N  —  1,  N)  was  chosen 
to  be  .4  in  the  nb  =  3  case,  |  of  that  used  in  the  nb  =  1  case. 

There  axe  therefore  48  simulation  models  that  incorporate  bias.  Thus,  a  total  of 
84  simulation  models  were  used  in  the  simulation  study. 

RESULTS  OF  THE  SIMULATION  STUDY 

The  results  of  the  simulation  stidy  are  given  in  Tables  5-8  and  9-12,  with  Tables  5-8 
summarizing  the  no  test  bias  simulations  and  Tables  9-12  summarizing  the  simulations 
having  test  bias  present.  The  c  column  indicates  whether  the  model  has  guessing  present 
or  not.  In  all  nb  =  1  cases,  the  Mantel-Haenszel  rejection  rate  for  the  hypothesis  of  no  item 
bias  (based  on  100  trials)  is  reported  in  the  MH  column.  In  all  cases  the  SIB  rejection  rate 
is  reported  in  the  SIB  column.  In  all  cases  where  test  bias  is  present  (Tables  9-12),  the 
Cp  column  presents  the  amount  of  potential  for  bias  present  (recall  (33));  the  fiy  column 
presents  our  index  of  the  amount  of  bias  present  against  the  focal  group  in  the  model 


21 


(recall  (6));  j3y  is  the  average  of  the  estimates  fiy  of  f$v  over  the  100  trials;  the 
column  presents  the  amount  of  bias  present  against  the  focal  group  in  the  model  from  the 
Mantel-Haenszel  perspective. 

Tables  5-8  indicate  that  both  the  SIB  statistic  and  the  MH  statistic  display  reasonable 
adherence  to  the  nominal  level  of  significance  of  0.05.  There  appear  to  be  situations  of 
no  bias,  which  have  a  target  ability  difference  and  which  depart  from  the  Rasch  model, 
where  the  Mantel-Haenszel  procedure  displays  inflated  Type  1  error.  (See  Zwick  (1990), 
for  a  discussion  of  this  problem  and  an  illustrative  example.)  There  is  evidence  that 
in  such  situations  (Shealy  (1989)),  the  SIB  statistic  adheres  closely  to  the  nominal  level 
of  significance.  On  the  other  hand  there  are  likely  portions  of  the  “parameter  space” 
of  realistic  IRT  models  where  our  linear  regression'  correction  is  stressed  and  hence  the 
MH  would  likely  display  better  Type  1  error  performance.  More  study  is  required  before 
it  can  be  claimed  that  either  MH  or  SIB  displays  superior  Type  1  error  performance. 
The  striking  fact  is  that  both  procedures  seem  to  be  quite  robust  against  the  inflating 
Type  1  error  effect  of  differing  target  ability  distributions.  In  this  regard,  dT  =  1  from  the 
practitioner’s  perspective  is  certainly  a  large  amount  of  target  ability  difference. 

Tables  9  and  11  indicate  that  both  the  SIB  statistic  and  the  MH  statistic  are  quite 
powerful  against  moderate  amounts  of  bias  and  fairly  powerful  against  small  amounts  of 
bias  in  a  single  biased  item.  Untabulated  simulation  studies  for  larger  amounts  of  bias 
produced  rejection  rates  of  essentially  unity  for  both  the  SIB  and  MH  procedures. 

Tables  10  and  12  indicate  that  the  SIB  procedure  is  quite  powerful  against  moderate 
amounts  of  bias  resulting  from  several  (3  here)  items  producing  bias  in  the  same  direction. 
The  reader  should  recall  that  the  amount  of  bias/item  was  lowered  for  the  nb  =  3  case  by 
reducing  the  discrimination  in  the  nuisance  dimension  from  avN  —  0.8  to  a vi  =  0.4  for  the 
studied  items.  In  both  the  nb  =  1  and  nb  =  3  cases,  the  potential  for  bias  as  measured 
by  Cp  was  kept  the  same  (C0  =  0.2  or  0.3).  These  two  table  show,  as  claimed,  that  the 
SIB  procedure  can  successfully  detect  simultaneous  item  bias,  even  if  the  amount  of  bias 
present  per  item  is  small. 

Tables  9  and  11  show,  for  the  particular  bias  models  of  the  simulation  study,  that  SIB 
is  somewhat  more  powerful  than  MH,  averaging  0.07  higher  for  those  models  for  which 
rejection  rates  axe  <  0.9.  We  do  not  know  whether  this  greater  SIB  power  generalizes  to 
other  models  of  bias. 

Tables  9-12  provide  evidence  about  the  ability  of  fry  to  estimate  /?y,  our  measure  of 
the  amount  of  bias  present.  For  each  case  j3v  is  an  indicator  of  the  amount  of  statistical 
bias  one  might  expect  in  using  fiy.  Clearly  statistical  bias  of  roughly  +0.01  is  present. 
The  estimated  standard  errors  for  fiy  are  not  recorded,  but  averaged  (roughly)  about  1/3 
of  fly.  Thus  if  fiy  =  0.05  there  is  likely  a  bias  of  0.01  and  a  standard  error  of  0.017.  Thus, 
crudely,  a  95%  confidence  interval  (if  asymptotic  normality  is  a  good  approximation)  would 


22 


be  given  by  0.04  ±  0.028.  Here  0.04  =  0.05  —  0.01  is  the  correction  for  statistical  bias.  It 
would  seem  that  provides  a  useful  empirical  index  of  the  amount  of  bias  present  in  a 
statistical  subtest  of  items;  more  work  is  planned  in  studying  its  theoretical  and  empirical 
properties. 

SUMMARY  AND  CONCLUSIONS 

The  SIB  procedure  was  designed  to  test  for  unidirectional  test  bias  residing  in  one  or 
more  items,  using  the  conception  that  test  bias  is  incipient  within  the  two  groups’  ability 
distributions  (in  terms  of  a  difference  in  conditional  nuisance  ability  distributions).  By 
means  of  the  regression  correction  presented  here,  the  inflation  of  the  SIB  test  statistic 
due  to  target  ability  difference  (one  group  having  a  stochastically  larger  distribution  of  0) 
is  extracted.  This  correction  represents  a  conceptual  link  between  conditional-on-observed- 
score  methods  and  IRT-based  methods,  just  as  the  practice  of  including  the  studied  item 
in  the  comparable  examinee  criterion  in  the  Mantel-Haenszel  procedure  of  Holland  and 
Thayer  (19S8)  does.  The  correction  adjusts  the  studied  subtest  scores  for  the  two  groups  so 
that  they  are  now  estimates  of  the  same  latent  IRT  ability  in  the  case  of  no  test  bias,  even  if 
group  target  abilities  exist.  It  is  useful  to  note  that  the  adjustment,  although  conceptually 
based  upon  multidimensional  IRT  modeling,  is  in  fact  computed  using  a  classical  approach 
and  hence  does  not  depend  on  IRT  ability  or  item  parameter  estimation. 

A  moderate  (84  models)  simulation  study  shows  that  both  MH  and  SIB  display  good 
adherence  to  the  nominal  level  of  significance,  even  for  large  ( dT  =  1)  target  ability  differ¬ 
ences.  In  the  case  of  a  single  biased  item,  both  MH  and  SIB  display  good  power  with  SIB 
displaying  slightly  higher  power.  As  designed,  the  SIB  statistic  displays  good  power  in  the 
case  of  several  biased  items  (3  here),  even  when  the  amount  of  bias/item  is  fairly  small. 

A  large  scale  simulation  study  is  in  progress  with  the  goal  of  obtaining  a  better  un¬ 
derstanding  of  the  performance  characteristics  of  both  the  SIB  and  the  MH  statistics  with 
particular  emphasis  on  investigation  of  statistical  power  and  adherence  to  the  nominal 
level  of  significance.  Based  upon  the  completed  portion  of  this  simulation  study  reported 
herein,  we  would  recommend  that  practitioners  use  the  SIB  and  MH  statistics  simultane¬ 
ously.  Both  are  extremely  easy  to  compute  and  for  moderate  sized  data  sets  run  quickly  on 
a  typical  PC  configuration.  Carefully  checked  code  with  a  user  oriented  driver  is  available 
from  the  authors  for  running  both  the  SIB  and  MH  statistics  on  real  data  sets  and  also 
for  doing  simulation  studies  cf  performance. 


23 


APPENDIX 


1.  Derivation  of  Vg(k),  the  estimated  regression  of  true  on  observed  valid 
subtest  score,  for  k  =  0, . . .  ,n. 

Recall  that  Vg(k)  =  E[P(0)  |  k,g]  needs  to  be  estimated  in  order  for  Sg(Vk )  of  (23) 
to  be  estimated.  Suppressing  g  for  simplicity,  we  need  to  estimate  V(k)  at  k  =  0, 1, . . .  ,n. 
Although  V(k)  is  not  necessarily  linear  in  k  (see  Shealy  (1989),  p.  87ff  for  a  discussion), 
els  an  approximation  we  assume  nV(k)  is  linear  in  k ;  i.e., 

nV(k)  =  a  +  (3k. 

To  estimate  V(fc),  we  consider  the  true  score  model  for  the  valid  subtest  score  X : 


X  =  T  +  e 


(Al) 


where 

£(e)  =  0,  eov(T,  e)  =  0  (A2) 

is  assumed  and  the  true  score  T  has  the  latent  variable  representation  T  =  nP(0).  Thus 

nV(k)  =  E[T  |  k). 

Standard  regression  theory  for  E(T  |  k)  yields 

V(k)  =  -(eT+  PxT-T-(k  -  EX)]  . 
n  \  ax  J 

But,  for  the  true  score  model  given  by  (Al)  and  (A2), 

PxTaT  _  i  _  a2(e) 

^  *2(*r 


(A3) 


(A4) 


is  well  known  (see  page  61  of  Lord  and  Novick  (1968).  Using  (Al)  and  (A2),  ET  =  EX 
holds.  Thus,  by  (A3)  and  (A4), 


m  =  a 

n 


(A5) 


holds. 

Clearly  EX  =  E[X  |  can  be  estimated  by  the  average  valid  subtest  score  Xg 
of  all  Group  g  examinees  taking  the  test.  Thus  it  remains  to  estimate  a2(e)/cr2(X). 


24 


a2(X)  =  <x2(X  |  g )  can  clearly  be  estimated  by  the  usual  sample  variance  estimate  of  all 
Group  g  examinees  taking  the  test 

I  s)= "  *»)’•  <A6) 

where  Jg  denotes  the  number  of  Group  g  examinees  taking  the  test  and  Xgg  is  the  valid 
subtest  number  correct  score  of  the  j th  such  Group  g  examinee.  It  remains  to  estimate 
<r2(e);  denote  this  estimation  by  d2(e).  Then  the  desired  estimation  of  o2  (e)  /  a2  (X)  will  be 
given  by  d2(e)/d2(X).  A  standard  conditioning  formula  yields,  indexing  the  valid  subtest 
items  by  i  =  1,2,. . .  ,n,  and  setting  Xg  =  X  \  g,  Qg  =  ©  |  g  as  a  reminder  that  sampling 
here  is  from  Group  g  only, 

A*  I  g)  =  a\X,)  =  e2(E[Xg  |  ©,))  +  E\c\X,  |  0,)] 

=  a\nP(Q,))  +  £  £{?,(&,) (1  -  ^(0,))],  ^ 

1=1 

using  the  standard  item  response  theory  assumption  of  local  independence  of  items,  given  G. 
Also,  by  (A2)  it  is  trivial  that 

v2(X  I  9)  =  <72(nP(G)  |  g)  +  cr2(e  |  g). 


Thus,  by  (A7), 


This  suggests 


^(e|9)  =  £JE[pi(0J)(i-pi(e1,))i. 

i=l 


i=i 


(A8) 


where  Uig  is  the  proportion  correct  for  Group  g  examinees  for  valid  subtest  item  i.  Thus, 
using  (A5),  we  will  estimate  Vg(k )  by 


X,+ 


^2(g  1  g)  \ 

a2(X\g)J 


(A9) 


2.  The  complete  procedure  to  detect  test  bias,  using  the  proposed  regres¬ 
sion  correction. 

The  SIB  procedure  in  its  entirety  is  presented  here.  First  we  set  some  basic  notation. 
Group  g  (g  =  Rot  F)  has  J g  examinees  taking  the  test  of  N  items.  The  response  to  item  i 
of  the  ;th  group  g  examinee  is  Ugij.  The  subtest  scores  are 

n  N 

Xgj  =  ^  Ugij  (valid  subtest  score),  Ygj  =  ^  Ugij  (studied  subtest  score). 

i=l  i=n+ 1 


25 


The  classical  group  item  difficulties  are  Ugi  =  (1/Jff)  Ugij  ■  Let  denote  summa¬ 
tion  over  those  group  g  examinees  j  with  k  correct  on  the  valid  subtest. 

1.  Compute  Jgk,  the  number  of  group  g  examinees  with  k  correct  on  the  valid  subtest. 

2.  Compute 


If  Jgk  =  0,  set  Ygk  ~  0;  if  Jgk  <  1,  set  S]k  =  0.  Ygk  is  the  sample  average  studied 
subtest  score  of  group  g  examinees  attaining  Xg  —  k ,  and  Sgk  is  the  sample  variance. 

3.  Compute  Pg(k)  =  Jgk/Jg,  for  both  groups  and  all  k.  Pg(k )  is  the  estimate  of  the  his¬ 
togram  of  X  |  G  =  g.  Then  compute  Pg{k ),  the  MLE  of  the  unimodalized  histogram 
of  A'  \G  =  g,  over  the  class  of  all  possible  unimodal  MLE  of  the  histograms  with  n  + 1 
possible  values  (A  |  G  =  g  is  assumed  to  have  a  unimodal  distribution  and  hence  its 
estimate  {P*(k),  k  >  0}  should  also  be  unimodal).  For  details  of  this  procedure,  using 
the  up-and-down-blocks  algorithm,  see  Barlow  et  al.  (1972;  pp.  72-73;  pp.  223-231). 

4.  Set  I(k)  =  1  for  all  k  unless  either 

(a)  k  =  0  or  n, 

(b)  Sm •  =  0  or  S2Fk  =  0, 

(c)  JRPjt(k)  <  Jm hl  or  JpPp(k)  <  Jmia  where  Jmin  is  set  by  user,  usually  around  30, 
or 

(d)  k  <  ricy,  where  Cy  >  0  is  the  user-specified  global  guessing  parameter  for  the 
test.  (It  is  assumed  that  there  is  a  relatively  constant  level  of  guessing  across 
item,  and  that  there  is  at  least  partial  knowledge  of  this  guessing  value.) 

/(&),  k  =  0, ...  ,n,  is  the  examinee  inclusion  indicator;  it  is  1  if  examinees  with 
X  =  k  are  to  have  their  responses  included  in  the  test  statistic,  (a)  excludes  the  two 
extreme  valid  subtest  scores  because  of  their  poor  estimation  of  target  ability.  The 

(b)  exclusion  is  obvious.  The  (c)  exclusion  is  done  to  assure  that  each  valid  subtest 
score  category  has  enough  examinees  to  make  YRk  and  YFk  approximately  normal;  the 
unimodal  mass  function  is  used  so  that  only  extreme  valid  subtest  score  catagories  are 
excluded.  As  for  (d),  all  valid  scores  below  that  expected  by  guessing  are  excluded. 

5.  Compute  the  regression  of  true  score  on  valid  subtest  score: 

(a)  U*i  =  If  the  result  is  <  0,  set  it  to  0  (adjustment  for  guessing). 

(c)  *HX\9)  =  Jrj  £&(*„• -v 

(d)  dJ(e  I  g)  =  E?=1  £?;,(  1  -  u-ti) 

fe')  l  —  _ JL -  (\  _ 

°g  ~  n  — 1  V1  <?*( X\g)) 


26 


(f)  Vg(k)  =  +  bg(k  -  Xg))  for  both  g  and  k  =  0, . . .  ,  n. 

6.  Make  the  regression  correction: 

(a)  kt  -  min{&  :  I(k)  *=  1},  kr  ~  ma x{k  :  I(k)  =  1}. 

(b)  Vt  =  k(VR(k)  +  VF(k)),  for  k,  <  k  <  kr. 

(c)  For  ke  <  k  <  kr,  compute 

M  ,  = _ YgMi  ~  ^g,fc-i 

’  t/ffe  +  l  )-V„(*-l)' 

Then  compute  Y;t  =  fjt  +  Msk(Vk  -  t >(k)). 

(d)  For  k  =  k(  and  k  =  kr,  compute  Y*k  in-the-  following-way. 

i.  Define 

(1  -  ?)?,.*+!  +  if  v^k)  <  V  <  vs (,k  +  1) 

.Ss(u)=.  ?j0  if  o  <  l>s(0) 

•  Y,n  «  O  >  v,(n), 

and 

tt  _  «  -  fy*) 

Vg(k  +  l)-Vg(kY 

Sg(y)  is  the  linear  interpolation  of  . . .  ,yjn}. 

ii.  Compute 

Yg\  =  Sg(Vk) 

for  k  =  ke  and  k  =  kr. 

7.  Compute  the  bias  statistic. 

(a)  Compute  J*  =  Ylk=o  I(k)Jgk>  the  number  of  included  group  g  examinees 

(b)  Compute 

„  gu  -  n  t)/(t) 

(ELo^(SL  +  sK)f(‘))1/2' 

(c)  Reject  if  :  ^  =  0  in  favor  of  j3v  >  0  at  level  a  if  B  >  za,  where  P[N( 0, 1)  > 
zQ ]  =  a  defines  za. 


References 


Ackerman,  T,  (1991).  A  didactic  explanation  of  item  bias,  item  impact,  and  item  validity 
from  a  multidimensional  IRT  perspective.  Submitted  for  publication  and  presented 
at  1991  annual  AERA/NCME  joint  meeting. 

Ansley,  T.N.  and  Forsyth,  R.A.  (1985).  An  examination  of  the  characteristics  of  uni¬ 
dimensional  IRT  parameter  estimates  derived  from  two-dimensional  data.  Applied 
Psychological  Measurement  9,  37-48. 

Barlow,  R.,  Bartholomew,  D.,  Bremmer,  J.,  and  Brunk,  H.  (1972).  Statistical  Inference 
under  Order  Restrictions.  New  York:  John  Wiley. 

Drasgow,  F.  (1987).  A  study  of  measurement  bias  of  two  standard  psychological  tests. 
Journal  of  Applied  Psychology  72,  19-30. 

Hambleton,  R.K.  and  Swaminanthan,  H.  (1985).  Item  Response  Theory:  Principles  and 
Applications.  Boston:  Kluwer-Nijhoff  Publishing. 

Holland,  P.W.  and  Thayer,  D.T.  (1988).  Differential  item  functioning  and  the  Mantel- 
Haenszel  procedure.  In  H.  Wainer  and  H.I.  Braun  (Eds.),  Test  Validity,  (pp.  129-145). 
Hillsdale,  New  Jersey:  Lawrence  Erlbaum. 

Kok,  F.  (1988).  Item  Bias  and  Test  Multidimensionality.  In  R.  Langeheine  and  J.  Rost 
(Eds.),  Latent  Trait  and  Latent  Models ,  (pp.  263-275).  New  York:  Plenum  Press. 

Lautenschlager,  G.  and  Park,  D.  (1988)  IRT  item  bias  detection  procedures:  issues  of 
model  mis-specification,  robustness,  and  parameter  linking.  Applied  Psychological 
Measurement  12,  365-376. 

Linn,  R.L.  and  Harnish,  D.  (1981).  Interactions  between  item  content  and  group  member¬ 
ship  on  achievement  test  items.  Journal  of  Educational  Measurement  18,  109-118. 

Linn,  R.,  Levine,  M.,  Hastings,  C.,  and  Wardrop,  J.  (1981).  Item  bias  on  a  test  of  reading 
comprehension.  Applied  Psychological  Measurement  5,  159-173. 

Lord,  F.M.  (1980).  Applications  of  Item  Response  Theory  to  Practical  Testing  Problems. 
Hillsdale,  New  Jersey:  Lawrence  Erlbaum. 

Lord,  F.M.  and  Novick,  M.R.  (1968).  Statistical  Theories  of  Mental  Test  Scores.  Reading, 
Massachusetts:  Addison- Wesley. 

Mellenbergh,  G.J.  (1882).  Contingency  table  methods  for  assessing  item  bias.  Journal  of 
Educational  Statistics  7,  105-118. 

Millsap,  R.E.  and  Meredith,  W.  (1989).  The  Detection  of  DIF:  Why  There  is  No  Free 
Lunch.  Paper  presented  at  the  Annual  Meeting  of  the  Psychometric  Society,  Univer¬ 
sity  of  California  at  Los  Angeles,  July  6-9,  1989. 

Mislevy,  R.J.  and  Bock,  R.D.  (1984).  Item  operating  characteristics  of  the  Armed  Services 
Aptitude  Battery  (ASVAB).  Form  8A.  Office  of  Naval  Research  Technical  Report 
(N00014-83-C-0283). 

Shealy,  R.T.  (1991).  Assessment  of  the  Shealy-Stout  test  bias  statistic:  a  simulation  study. 
In  preparation. 


28 


Shealy,  R.T.  (1989).  An  Item  Response  Theory-Based,  Statistical  Procedure  for  Detecting 
Concurrent  Internal  Bias  in  Ability  Tests.  Unpublished  doctoral  dissertation,  Univer¬ 
sity  of  Illinois,  Urbana-Champaign. 

Shealy,  R.T.  and  Stout,  W.F.  (1991).  An  Item  Response  Theory  Model  for  Test  Bias 
(Technical  Report  4421-548  under  ONR  grant  N00014-90-J-1940).  Champaign,  Ur- 
bana:  Department  of  Statistics,  University  of  Illinois  (A  1989  version  of  this  was  widely 
distributed;  it  will  appear,  by  invitation,  in  Differential  Item  Functioning ,  Theory  and 
Practice ,  1992,  Hillsdale,  New  Jersey:  Erlbaum.) 

Thissen,  D.,  Steinberg,  L.,  and  Wainer,  H.  (1988).  Use  of  item  response  theory  in  the 
study  of  group  differences  in  trace  lines.  In  H.  Wainer  and  H.I.  Braun  (Eds.),  Test 
Validity  (pp.  147-169).  Hillsdale,  New  Jersey:  Lawrence  Erlbaum. 

Zwick,  R.  (1990).  When  do  item  response  function  and  Mantel- Haenszel  definitions  of 
differential  item  functioning  coincide?  Journal  of  Educational  Statistics  15,  185-197. 


29 


Figure  1.  Stochastically  ordered  and  unordcrcd  pairs  of  distributions 


0|A,F  0  | /c,& 


Figure  2.  Prior  and  posterior  target  ability  distributions 


fundamental  latent  scale  (5) 
Figure  3.  The  three  latent  scales. 


Figure  4.  The  valid  su blest  to  studied  subtest  transformation 


Table  1:  Means  and  sds  for  the  ASBAB  and  ACT  item  parameters  used  in  the  study. 


Test 

5 

8 

b 

c 

■ 

N 

ASVAB  auto/shop 

m  | 

0.7 

0.09 

IS 

uni 

25 

ACT  math 

Wroa 

0.35 

0.5 

ESI 

B3 

EES 

40 

Table  2:  Item  parameters  for  2-dimensional  studied  in  the  bias  case. 


Hi 

Item  No. 

die 

ESI 

Cf 

1 

N 

1.0 

0.0 

O.S 

0.0 

c 

3 

N  —  2 

0.6 

-0.3 

0.4 

0.0 

KBE9 

N~1 

O.S 

0.0 

0.4 

0.0 

c 

N 

1.0 

0.3 

0.4 

0.0 

usm 

Table  3:  Equivalence  table  for  bias  potential  and  actual  test  bias. 


E29 

- 

liM 

0.8 

0.05 

3 

- 

3 

lifl 

0.06 

3 

0.4 

0.09 

Table  4:  Equivalence  of  A/,/#  and  fiu  when  n\,  —  1,  using  item  parameters  of  Table  2. 


C0 

c’s  used 

&MH 

fiu 

0.0 

- 

0 

0 

0.2 

•  0.0 

.27 

0.034 

0.2 

actual  c’s 

.27 

0.026 

0.3 

6.0 

.40 

0.051 

0.3 

actual  c’s 

.39 

0.039 

Table  5:  No  bias,  ACT,  m  =  1,  a  =  0.05. 


Jf 

Jr 

d 

dx 

MH 

SIB 

■Mil 

1500 

a 

•0 

.03  j 

•07 

1000 

3000 

u 

.0 

.00 

.02 

3000 

3000 

a 

.0 

.09 

.06 

■Mil 

1500 

□ 

.5 

.04 

.04 

1000 

3000 

a 

.5 

.10 

.10 

3000 

3000 

c 

.5 

.05 

.03 

1500 

1500 

c 

1.0 

.02 

.05 

1000 

3000 

C 

1.0 

.05 

.10 

3000 

3000 

a 

1.0 

.06 

.09 

Table  6:  No  bias,  ACT,  rn  =  3,  a  =  0.05. 


Jf 

Jr 

D 

SIB 

1500 

1500 

a 

m 

.05 

1000 

3000 

D 

HI 

.02 

3000 

3000 

D 

HI 

.07 

1500 

■Mil 

O 

.5 

.OS 

1000 

3000 

B 

.5 

.07 

3000 

3000 

B 

.5 

.05 

1500 

1500 

c 

1.0 

.06 

1000 

3000 

c 

1.0 

.16 

3000 

3000 

a 

1.0 

.09 

Table  7:  No  bias,  ASVAB,  n*  =  1,  a  =  0.05. 


Jf 

Jr 

1500 

1500 

1000 

3000 

3000 

3000 

1500 

■Mil 

1000 

3000 

3000 

3000 

1500 

1500 

1000 

3000 

3000 

3000 

MH 

SIB 

.OS 

.07 

.04 

.04 

.06 

.06 

.13 

.14 

.04 

.03 

.05 

.04 

.07 

.02 

.15 

.09 

.11 

.01 

Table  S:  No  bias,  ASVAB,  nj,  =  3,  ct  =  0.05 


Jf 

Jr 

1500 

1500 

1000 

3000 

3000 

3000 

1500 

1500 

1000 

3000 

3000 

3000 

1500 

1500 

1000 

3000 

3000 

3000 

Table  9:  Bias,  a,  =  0.8,  ACT,  ?ij,  =  1,  a  =  0.05. 


mm 

Jr 

c 

dx 

Cp 

fiu 

R 

A  MH 

MH 

SIB 

1500 

1500 

c 

T 

.2 

.026 

.032 

.27 

.46 

.58 

1000 

3000 

IQI 

JO 

m\ 

.032 

.042 

.27 

.64 

.70 

3000 

3000 

UKIEJI 

.032 

.035 

.27 

.91 

.95 

1500 

1500 

DDDl 

.02P 

.035 

.27 

.51 

.60 

1000 

3000 

HI 

.5 

.2 

.034 

.044 

.27 

.65 

.72 

3000 

3000 

nr 

.5 

m\ 

.034 

.038 

.27 

.91 

.94 

1500 

1500 

HI 

0 

mi 

.048 

.052 

.40 

.84 

.90 

1000 

3000 

IQHI 

.3 

.042 

.053 

.40 

.87 

.91 

3000 

3000 

au 

.3 

.042 

.045 

.40 

.97 

1.00 

1500 

1500 

□El 

.3 

.050 

.047 

.40 

.99 

.99 

1000 

3000 

rn 

•51 

|  -3 

.042 

.054 

.40 

.SO 

.84 

3000 

3000 

OBJ 

.3 

.042 

.064 

.40 

.91 

.92 

Table  10:  Bias,  a,  =  0.4,  ACT,  =  3,  a  s  0.05. 


Jf 

Jr 

1500 

1500 

1000 

3000 

3000 

3000 

1500 

1500 

1000 

3000 

3000 

3000 

1500 

1500 

1000 

3000 

3000 

3000 

1500 

1500 

1000 

3000 

3000 

3000 

& 

SIB 

.063 

.069 

.70 

.053 

.067 

.68 

.053 

.053 

.SO 

.055 

.071 

.60 

.065 

•0S3 

.72 

.065 

.074 

.96 

.093 

.095 

.91 

.093 

.11 

.89 

.OSO 

.081 

.99 

.097 

.12 

.97 

.0S4 

■HI 

.89 

.0S3 

.09 

1.00 

Table  11:  Bias,  a,  =  0.8,  ASVAB,  n*  =  1,  a  =  0.05. 


J> 

Jr 

c 

■<T 

Cfi 

fiu 

& 

MH 

SIB 

1500 

1500 

c 

0 

.2 

.026 

.029 

.27 

.42 

.50 

1000 

3000 

Dl 

0 

.2| 

.034 

.039 

.27 

.63 

.79 

3000 

3000 

im 

0 

m 

.034 

.034 

.27 

.90 

.95 

1500 

1500 

mm 

.2 

.027 

.035 

.27 

.63 

.66 

1000 

3000 

Hi 

.034 

.03S 

.27 

.63 

.70 

3000 

3000 

IDI 

.5 

.2 

.034 

.036 

.27 

.89 

.91 

1500 

1500 

Dl 

0 

.3 

.051 

.052 

.40 

.85 

.92 

1000 

3000 

mm 

.3 

.042 

.044 

.40 

.77 

.84 

3000 

3000 

rn 

0 

.3 

.042 

.046 

.40 

.99 

.99 

1500 

1500 

QB 

Hi 

.051 

.057 

.40 

.91 

.93 

1000 

3000 

c 

.5 

.3 

.038 

•04S 

.40 

.77 

.82 

3000 

3000 

c 

.5 

i  .3 

.039 

.045 

.40 

.94 

.97 

Table  12:  Bias,  a,  =  0.4,  ASVAB,  n*  =  3,  a  =  0.05. 


Jf 

Jr 

Jcl 

dT 

Cfi 

Pu 

A. 

SIB 

1500 

1500 

0 

.2 

.065 

.067 

.70 

1000 

3000 

IE3KI 

.2 

.052 

.056 

.53 

3000 

3000 

c 

m 

.052 

.053 

.85 

1500  | 

1500 

c 

.5 

.2 

.052  ; 

.068 

.63 

1000 

3000 

IDI 

.5 

.2 

.064  j 

.0S3 

.73 

3000 

3000 

iai 

.2 

.064 

.072 

.92 

1500 

1500 

mm 

.3 

.098 

.10 

.94 

1000 

3000 

01 

.3 

.097 

.10 

.97 

3000 

3000 

lauKi 

.079 

.079 

.98 

1500 

1500 

UEJKJ 

.097 

.011 

.98 

1000 

3000 

Dl 

Al 

.3 

.076 

.098 

.87 

3000 

3000 

DEI  HI 

.07S 

.090 

.99 

Diatributloo  LlM 


Dr.  Terry  Ackerman 
Educational  Piycbolo ar 
210  Education  Bldg, 

UnKeraity  of  UOnoir 
Champaign,  IL  <1801 

Dr.  Jimea  Algina 
1401  Norman  PUB 
UnKeraity  0 f  Florida 
OainervOa,  FL  52405 

Dr.  Eriin|  B.  Anderaen 
Department  of  StelletJee 
Studicctrecdc  6 
145S  Copen  hi  gan 
DENMARK 

Dr.  Ronald  Annatroog 
Rutgen  UnKmiiy 
Graduate  School  of  Management 
Nearnt,  K1  07102 

Dr.  Ex  L  Baker 
UCLA  Ccmcr  for  tbc  Study 
of  Evaluation 
143  Moon  H»D 
UnKmiiy  of  California 
Lot  Angclee,  CA  00024 

Dr.  Laura  L  Bamaa 
College  of  Education 
UnKmiiy  of  Toledo 
2001  W.  Bancroft  Since 
Toledo,  OH  43406 

Dr.  WBBam  M.  Ban 
UnKmiiy  of  Minnesota 
Dept.  of  Educ  Pryeboloff 
310  Burton  Hal 
170  Paiehuty  Dr,  S.E 
Minneepotit,  MN  53435 

Dr.  laaac  Bejar 
Lee  School  Admiaaione 
ScrvMca 
T.O.  Bo*  40 

Known,  FA  100404)040 

Dr.  In  Bcrnatdn 
Department  of  Faycholocr 
UnKmiiy  of  Tear 
F.O.  Boa  1932* 

AriinfUro.  IX  740104)52* 

Dr.  Menucha  Birenbaum 
School  of  Education 
Tel  AvK  UnKmiiy 
Raaat  AvK  4007* 

ISRAEL 

Dr.  Arthur  S.  Blaiwee 
Code  K712 

Naval  Training  Syatemc  Center 
Oriarxki,  FI  32*13-7100 

Dr.  Bruce  BKaoca 
Defence  Manpower  Data  Center 
09  Pacific  St 
Suite  155A 

Moolercy,  CA  03043-3231 

Cdt  Arnold  Bobrer 
Sadie  PaychoiopKch  Ondenoek 
Rekruterinp-En  Sdodieoentrua 
Kwa  flier  Koninjtn  Aalrid 
Brurjoatnat 

1120  Bruaaala,  BELGIUM 

Dr.  Robert  Breaux 
Code  2*1 

Naval  Traininj  Syateaa  Cooler 
Oriando,  FL  32*26-3224 


Dr.  Robert  Brennan 
American  College  Tealinj 
Programs 
P.  O.  Baa  14* 

Iowa  Cly,  1A  52243 

Dr.  Grttoty  Candal 
CIBWcGraa-HiJ 
2300  Garden  Road 
Mooleny,  CA  01040 

Dr.  lobn  B.  Carroll 
400  Elliott  Rd,  North 
Chapel  HU  NC  27314 

Dr.  lobn  M  Cam* 

IBM  Wataon  Research  Canter 
Uaar  Interface  Inatiuua 
P.O.  Boa  704 

Yorktwn  Height!,  NY  1039* 

Dr.  Robert  M  Cam* 

Chief  of  Naval  Opentione 
OP-01B2 

Washington,  DC  20230 

Dr,  Raymond  E  Chriatal 
UES  LAMP  Socnoe  Advisor 
AFHRL/MOEL 
Brooka  AFB,  TX  7*213 

Mr.  Hua  Hua  Chung 
UnKmiiy  of  Illinoia 
Department  of  Slatiaticc 
101  lllini  Ha* 

723  South  Wright  St 
Champaign,  IL  61*20 

Dr.  Norman  Cliff 
Department  of  Paycbotop 
UnN.  of  So.  California 
Loa  Angalaa,  CA  000*9-1061 

Director,  Manpower  Program 
Canter  for  Naval  Anafyeea 
4401  Ford  Avenue 
F.O.  Boa  1424* 

Alexandria,  VA  22302-024* 

Director, 

Manpower  Support  and 
Readineea  Program 
Center  for  Naval  AnaKiit 
2000  North  Beauregard  Street 
Alexandria,  VA  22311 

Dr.  Stanley  Cotlyer 
Office  of  Naval  Technology 
Code  222 

(00  R  Ouincy  Sired 
Arlington,  VA  22217-5000 

Dr.  Hana  F.  Crnohag 
Faculty  of  Law 
UnKmiiy  of  Limburj 
F.O.  Boa  416 
Miutricfai 

The  NETHERLANDS  6200  MD 

Mi.  Carolyn  R.  Crane 
lohna  Hopkina  UnKmiiy 
Department  of  Paycbology 
Chariea  A  34th  Street 
Baltimore,  MD  21211 

Dr.  Toothy  Dtvey 

American  College  Tailing  Program 

P.O.  Boa  14* 

Iowa  Gty,  1A  52241 

Dr.  C  M  Dayton 
Department  of  Meat  moment 
Sutiaiica  A  Evaluation 
CoOcpe  of  Education 
UnKmiiy  of  Maryland 
College  Fait,  MD  20742 


Dr.  Ralph  J.  DcAyila 
Measurement,  Statistics, 
and  Evaluation 
Benjamin  Bldg,  Ra.  4112 
University  of  Maryland 
College  Farit,  MD  20742 

Dr.  Lou  Di  Bello 
CERL 

UnKmiiy  of  lUinoaa 
103  South  Mttbcwe  Avenue 
Urbane,  tL  61*01 

Dr.  Danpcaaed  DKgl 
Canter  for  Naval  Analysis 
4401  Ford  Avenue 
F.O.  Box  1424* 

Alexandria,  VA  22302406* 

Mr.  Hai-IS  Dong 

Bad  Communication!  Raaeercb 

Room  FYA-1K207 

F.O.  Boa  1320 

Piacatavay,  N1  00&S3-U20 

Dr.  Fritz  Draagow 
UnKeraity  of  Illinoia 
Department  of  Piycbotogy 
603  E  Darnel  St 
Champaign,  IL  61*20 

Defeme  Technical 
Information  Center 
Cameron  Station,  Bldg  5 
Alexandria,  VA  22314 
(1  Copie.) 

Dr.  Stephen  Dunbar 
224B  Lindquist  Center 
for  Mcaauremcnt 
UnKeraity  of  Iowa 
Iowa  Gty,  1A  52242 

Dr.  Jamaa  A.  Earlaa 

Air  Forte  Human  Raaourom  Lab 

Brooke  AFB,  TX  7*215 

Dr.  Sueen  Embretaon 
UnKeraity  of  Kansas 
Piycbolojy  Department 
426  Fraser 
Lawrence.  KS  66045 

Dr.  George  Englebard,  Jr. 

DKiaioo  of  Educational  Sludice 
Emory  UnKeraity 
210  Frsbbumc  Blip 
Atlanta.  GA  10322 

ERIC  Fedliiy-Acquiaiiione 
2440  Kcvcarch  BKd,  Suite  530 
Rockville.  MD  20*30-3218 

Dr.  Benjamin  A-  Fairbank 
Operational  Technologies  Carp. 
5*25  Calagbatv  Suite  225 
San  Antonio,  TX  7*228 

Dr.  ManbaS  J.  Farr,  Conauhant 
Cognitive  A  Instructional  Sdcncea 
2520  North  Vernon  Street 
Arlington,  VA  22207 

Dr.  F-A  Federico 
Code  51 
NPRDC 

San  Diego,  CA  02152-6100 

Dr.  Leonard  Fcldl 
Lindquist  Center 
for  Mcaaurement 
UnKmiiy  of  Iowa 
Iowa  City,  1A  52242 


University  of  IKnok/SuxX 


wvn 


Or.  Richard  L  Ferjuaoo 
Men  CeOt|t  Testing 
P.O.  Bex  168 
loei  Chy,  1A  520 

Dr.  Gerhard  Fiaebm 
Utbiggasse  ili 
A  1010  Vianna 
AUSTRIA 

Dr.  Myron  Fiachl 
US.  Army  Headquarters 
DAPE-MRA 
Tb*  Pentagon 

Washington  DC  20)104)00 

Frol  Donald  Fitzgerald 
University  of  New  England 
Department  of  F^ebolosr 
Armidale,  New  South  Welee  2)51 
AUSTRALIA 

Mr.  PeuI  Foley 

Nevy  Fenoooel  RAD  Center 

Sea  Diego,  CA  021524000 

Dr.  Alfred  R.  Frejly 
AFOSR/Nt,  Bldg.  410 
Bolling  AFB,  DC  20)324440 

Dr.  Robert  D.  Gibbon* 

Dlinoie  State  Fiycfaiatric  loti 
Rm  529W 

1401  W.  Taylor  Street 
Cbkegn  IL  60612 

Dr.  Janiee  Gifford 
University  of  Maaaachuaetu 
School  of  Education 
Aabenl  MA  01003 

Dr.  Drew  Gitomar 
Eduoatiooal  Teatlnf  Service 
Friooetoo,  NJ  0*541 

Dr.  Robert  G  later 
Learning  Reaearch 
A  Development  Ceoter 

Unheiaify  of  Pittsburgh 
3939  OHare  Street 
Pittsburgh,  FA  15260 

Dr.  Sherrie  Goa 
AFHRL/MOM1 
Brook*  AFB,  IX  70235-5601 

Dr.  Bert  Greco 
John*  Hopbine  University 
Department  of  Ptychology 
Charles  A  34th  Street 
Baltimore,  MD  21211 

Michael  Ha  boo 
DORMER  GMBH 
P.O.  Box  1420 
D-7990  Friedricbthafco  1 
WEST  GERMANY 

Frol  Edward  Haertel 
School  of  Educatioo 
Stanford  Univeraity 
Stanford,  CA  94305 

Dr.  Ronald  K.  Hambletoa 
University  of  MaeeachuutU 
Laboratory  of  Paycfaooetric 
and  Evaluative  Reaearch 
HO*  South,  Room  152 
Amberet,  MA  01003 

Dr.  Dehtyn  Hamiacfa 
University  of  tffiooit 
51  Getty  Drive 
Champaign.  IL  61820 


Dr.  Grant  Henning 
Senior  Reaearch  Scientist 
Division  of  Meeturement 
Reaearch  and  Servioee 
Educational  Tatting  Service 
Princeton,  NJ  00541 

M*.  Rebecca  Hetlar 
Nevy  Penonnel  RAD  Center 
Code  63 

Sea  Diego,  CA  92152-6600 

Dr.  Thomas  M.  Hirsch 
ACT 

P.  O.  Box  161 
lowi  City,  1A  52243 

Dr.  Paul  W.  Holland 
Educational  Teeting  Service,  21-T 
Roeedele  Road 
Princeton,  NJ  06541 

Dr.  Paul  Horn 
677  0  Sued  #164 
Chute  Vets,  CA  92010 

Mt.  Julia  S.  Hough 
Cambridge  Univeraity  Prett 
40  Weat  20tb  Street 
New  York.  NY  10011 

Dr.  Wfflua  Howel 
Chief  Sciential 
AFHRLCA 

Brooka  AFB,  TX  78235-5601 

Dr.  Lloyd  Humphreys 
Uolveraiiy  of  Uinais 
Department  of  Psychology 
60!  Eaat  Denial  Street 
Champaign.  IL  61630 

Dr.  Steven  Hunk* 

3-104  Educ  R 
Univeraity  of  Alberta 
Edmonton  Aibena 
CANADA  T6G2GS 

Dr.  Huynh  Huynh 
CoSege  of  Educe  don 
Unhr.  of  South  Carolina 
Coiumbia.  SC  29206 

Dr.  Robert  Jannaronc 
Elec,  end  Computer  Eng.  Depl 
Univeraity  of  South  Carolina 
Columbia,  SC  29208 

Dr.  Kumar  Joag-dcv 
Unhetaity  of  Uinoia 
Department  of  Statiatkt 
101  Uini  Hal 
725  South  Wright  Street 
Champaign,  IL  61820 

Dr.  Douglae  R  Jooee 
1280  Woodfern  Court 
Tome  River,  NJ  06753 

Dr.  Brian  J unbar 
Caroegie-MeSoo  Univeraity 
Department  of  Statistic* 

Schenley  Park 
Pittsburgh,  PA  15213 

Dr.  Michael  Kaplan 
Office  of  Basic  Research 
US.  Army  Reaearch  Inatitutc 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333-5600 

Dr.  Mikoo  S  Katz 
European  Scicocc  Coordination 
Office 

US.  Army  Reaearch  Institute 
Box  65 

FPO  New  York  09510-1500 


Frol  John  A.  Keats 
Departmant  of  Ptychologr 
Univeraity  of  Newcastle 
NSW.  2306 
AUSTRALIA 

Dr.  Jee-keun  Kim 
Department  of  Piycboiogr 
Middle  Tennessee  Stale 
Univeraity 
P.O.  Box  522 
Murfreeeborj,  TN  37132 

Mr,  Soon-Hoon  Kim 
Computer-based  Educatioo 
Raaeerch  Laboratory 
Unhetaity  of  Illinois 
Urbane, IL  61801 

Dr.  G.  Gage  Kingsbury 
Portland  Public  School* 

Research  and  Evaluation  Department 
501  North  Dixon  Street 
P.  O.  Box  3107 
Portland,  OR  972093107 

Dr.  William  Koch 
Box  7246,  Mesa,  and  Eval  Ctr. 
Unhetaity  of  Texaa-Auatin 
Austin,  TX  78703 

Dr.  Richard  J.  Koubek 
Department  of  Biomedical 
A  Human  Factors 
139  Engineering  A  MiO  Bldg 
Wright  Slate  Unhetaity 
Dayton.  OH  45435 

Dr.  Leonard  Knock  tr 
Nevy  Penonnel  RAD  Center 
Code  62 

San  Diego,  CA  92152-6800 

Dr.  Jetty  Lcbnua 

Defense  Manpower  Data  Center 

Suite  400 

1600  Wilson  Bhd 

Rotten,  VA  22209 

Dr.  Thomas  Leonard 
Unhetaity  of  Wisconsin 
Department  of  Statistics 
1210  Weat  Dayton  Strom 
Madison,  WI  53705 

Dr.  Micfaad  Levine 
Educational  Paycbolop 
210  Education  Bldg 
Univeraity  of  Ulinoit 
Champaign,  IL  61801 

Dr.  Charles  Lewi* 

Educational  Testing  Service 
Princeton,  NJ  065414001 

Mr.  Rodney  Lim 
Unhetaity  of  Illinois 
Department  of  P*ychoiogy 
603  &  Daniel  Si 
Champaign,  IL  61820 

Dr.  Robert  L  Lino 
Campus  Bex  249 
Unhetaity  of  Colorado 
Boulder,  CO  803094249 

Dr.  Robert  Lockoan 
Center  for  Naval  Anatyw 
4401  Ford  Avenue 
F.O.  Box  16266 
Alexandria,  VA  22302-0266 

Dr.  Frederic  M.  Lord 
Educational  Testing  Service 
Princeton,  NJ  06541 


University  of  ltlinoi»5lout 


120290 


* 


Dr.  Rkberd  Luecbt 
ACT 

r.  0.  Box  1« 

Im  City,  1A  520 

Dr.  George  B.  Miemdy 
Department  of  Measurement 
SutiMla  A  Evaluation 
College  of  Education 
University  of  Maryland 
College  Park,  MD  2070 

Dr.  Oety  Malta 
Stop31-B 

Educational  Totting  Service 
Prioetton,  NJ  0MJ1 

Dr.  Classen  ).  Mania 
Office  of  Chief  of  Naval 
Operations  (OP  13  F) 

Navy  Annex.  Room  2832 
Washington,  DC  20350 

Dr.  Janet  R.  McBride 
HutsRRO 
0430  Elnbunl  Drive 
San  Diego,  CA  92120 

Dr.  Clarence  C  McCormick 
HO,  USMEPCOM/MEPCT 
2500  Green  Bay  Rood 
North  Chicago,  IL  60064 

Mr.  Christopher  McCutker 
University  of  Illinois 
Deportment  of  Psychology 
603  E  Daniel  Sc 
Champaign,  IL  61520 

Dr.  Robert  McKinley 
Educational  Tatting  Service 
Prinoeton,  NJ  06541 

Mr.  Alan  Mead 
do  Dr.  Michael  Levine 
Educational  Prvcbdojy 
210  Education  Bldg. 
Univanity  of  lllinoit 
Champaign,  IL  <1801 

Dr.  Timothy  Millar 
ACT 

P.  O.  Bos  166 
ioaaGty.  1A  5243 

Dr.  Robert  Mislety 
Educational  Testing  Service 
Princeton,  NJ  06541 

Dr.  William  Montague 
NPRDC  Code  13 
San  Diego,  CA  921524800 

Me  Kathleen  Moreno 
Navy  Personnel  RAD  Center 
Code  <2 

San  Diego,  CA  921524800 

Headquarters  Marine  Corpt 
Code  MPI-20 
Washington.  DC  20380 

Dr.  Ratna  Nandakumar 
Educational  Stud  ice 
Willard  HaR  Room  213B 
University  of  Delaware 
Newark,  DE  19714 

Library.  NPRDC 
Code  P2D1L 

Sen  Diego,  CA  921524800 


Librarian 

Naval  Center  for  Applied  Research 
in  Artificial  Intelligence 
Naval  Raaeatth  Laboratory 
Code  5510 

Washington,  DC  20375-5000 

Dr.  Hamid  P.  O'Neil,  Jr. 

School  of  Ed  nation  •  WPH  801 
Department  of  Educational 
Paycbeto#  k  Technology 
University  of  Southern  California 
Lee  Angelas,  CA  900894031 

Dr.  Jamas  E  Olsen 
WICAT  Systems 
1175  South  Suit  Street 
Orem,  UT  84058 

Office  of  Naval  Research, 

Code  1142CS 
800  N.  OtAvy  Street 
Arfingtoo,  VA  2217-5000 
(4  Copies) 

Dr.  Judith  Orasanu 
Basic  Research  Office 
Army  Research  Institute 
5001  Eisenhower  Avenue 
Alexandria,  VA  22333 

Dr.  Jesse  Oriantky 
Institute  for  Defense  Analytes 
1801  N.  Beauregard  St. 

Alexandria,  VA  223ll 

Dr.  Peter  J.  Paabley 
Educational  Testing  Service 
Rotedak  Road 
Prinoeton.  NJ  08541 

Wayne  M.  Patienot 
American  Council  on  Education 
GED  Testing  Service,  Suite  20 
One  Dupoot  Grek,  NW 
Wathington,  DC  20034 

Dr.  James  Paulson 
Department  of  Prychology 
Portland  State  Univcnity 
P.O.  Box  751 
Portland,  OR  97207 

Dept  of  Administrative  Science# 
Code  54 

Naval  Postgraduate  School 
Monterey,  CA  93943-5024 

Dr.  Mark  D.  Reckatc 
ACT 

P.  O.  Bex  161 
Iowa  Oy,  U  5243 

Dtt  Malcolm  Ree 
AFHRL/MOA 
Brooka  APB,  TX  75235 

Mr.  Steve  Rena 
N660  EKoc  Hal 
Univcnity  of  Minnesota 
75  E  Rhcr  Road 
Minneapolis,  MN  554554344 

Dr.  Carl  Rosa 
CNET-PDCD 
Building  90 

Gnat  Lakes  NIC  IL  <0068 
Dr.  J.  Ryan 

Deportment  of  Education 
University  of  South  Carolina 
Columbia,  SC  29208 


Dr.  Fumiko  Samcfuna 
Department  of  Psyebelop 
University  of  Tennessee 
310B  Austin  Pesy  Bldg 
Knoxville,  TN  379144900 

Mr.  Drew  Sends 
NPRDC  Code  42 
Sen  Diego,  CA  921524800 

Lowell  Scbocr 
Psychological  k  Quantitative 
Foundations 
College  of  Education 
Univanity  of  Iowa 
Iowa  Gty,  1A  5242 

Dr.  Mary  Schntx 
4100  Farksidt 
Carlsbad.  CA  92008 

Dr.  Den  Segal 

Navy  Personnel  RAD  Center 

San  Diego.  CA  92152 

Dr.  Robin  Sbcaty 
University  of  Illinois 
Department  of  Statistics 
101  lnim'  Hal 
725  South  Wright  Sl 
Champaign.  IL  41820 

Dr.  Katuo  Sbigcmasu 
7-9-24  Kugenuma-Kaipn 
Fujisawa  21 
JAPAN 

Dr.  Randall  Shumaker 
Naval  Research  Laboratory 
Coda  5510 

4555  Overtook  Avenue,  SW. 
Washington,  DC  20375-5000 

Dr.  Richard  E  Soow 
School  of  Education 
Stanford  University 
Stanford,  CA  94305 

Dr.  Richard  C  Sorensen 
Navy  Personnel  RAD  Center 
San  Diego,  CA  921524800 

Dr.  Judy  Spray 
ACT 

P.O.  Box  148 
Iowa  CHy,  1A  5243 

Dr.  Martha  Stocking 
Educational  Testing  Service 
Princeton.  NJ  08541 

Dr.  Peter  Sutoff 
Center  for  Naval  Analysis 
4401  Ford  Avenue 
F.O.  Box  14268 
Alexandria,  VA  223024268 

Dr.  William  Stout 
University  of  Illinois 
Department  of  Statistics 
101  IBini  Hal 
725  South  Wright  Si. 
Champaign.  IL  61820 


Dr.  Harihano  Swaminatban 
Laboratory  of  Piycbometric  and 
Evaluation  Research 
School  of  Education 
University  of  Massachusetts 
Amherst,  MA  01003 

Mr.  Brad  Sympaoo 
Navy  Persoood  RAD  Center 
Code  42 

San  Diego,  CA  921524800 


Univenity  of  IHiooie.'Stoot 


Dr.  John  Tangney 
AFOSR/NL,  Bldg.  410 
Bolling  AFB,  DC  20332-6448 

Dr.  Bkumi  Tatauok* 

Educational  Tea  Ling  Service 
Mall  Stop  03-T 
Princeton,  KJ  06541 

Dr.  Maurice  Tiuuoka 
Educational  Tcating  Soviet 
Mail  Stop  03-T 
Princeton,  NJ  06541 

Dr.  Devid  Tbieeen 
Department  of  Piycbolop 
Unlveniiy  of  Kama* 

Lawrence,  RS  <6044 

Mr.  Tbornu  J.  Tbom*a 
Johna  Hopkin*  Unlveniiy 
Depttuneffi  of  P«ycbc4o0 
Charia  A  34tb  Street 
Baltimore.  MD  21211 

Mr.  Gary  Tbooaiaoa 
Unlveniiy  of  lllinoii 
Educational  Prycboiogy 
Champaign,  1L  61820 

Dr.  Robert  Taulakawa 
Unlveniiy  of  Miuoori 
Dcpanrscnt  of  Slatiatka 
222  Math.  Sciencea  Bid*. 

Columbia,  MO  65211 

Dr.  Ladyard  Tucker 
Univenity  of  lllinoia 
Department  of  Paycbologr 
603  E.  Daniel  Street 
Champaign,  IL  61820 

Dr.  David  Vale 
Aaeeeement  Syaletn*  Cocp. 

2233  Univenity  Avenue 
Suite  440 

Si  Paul  MN  55114 

Dr.  Frank  L  Vidno 
Navy  Pcnonnel  RAD  Center 
San  Dicjo,  CA  92152-6800 

Dr.  Howard  Weiner 
Educational  Tcating  Service 
Princeton,  NJ  08541 

Dr.  Michael  T.  Waller 
Univenity  ot  Wiecnnain-Milweukec 
Educational  Piycboiojy  Department 
Bo*  413 

Milwaukee.  WI  53201 

Dr.  Ming-Mei  Wang 
Educational  Tcating  Service 
Mail  Stop  03-T 
Princeton,  NJ  06541 

Dr.  Tbocnai  A-  Warm 
FAA  Academy  AAC934D 
P.O.  Bo*  25062 
Oklahoma  Gly,  OK  73125 

Dr.  Brian  Watcn 
HumRRO 
1100  &  Waabingtoo 
Alexandria,  VA  22314 

Dr.  David  J.  Wdae 
N660  Enioa  Hal 
Univenity  of  Minneaota 
75  E  River  Road 
Minoeapofa,  MN  554J5A  ' 


120250 


Major  John  Welch 
AFHRUMOAN 
Brooka  AFB,  TX  78223 

Dr.  Douglaa  Weod 
Code  51 

Navy  Penoond  RAD  Center 
San  Diego,  CA  92152-6800 

Dr.  Rand  R.  WBco* 

Univenity  of  Southern 
California 

Department  of  Piycbotogr 
Lot  Angdea,  CA  90069-1061 

German  Military  Repreaentative 
ATTN:  Wolfpng  Wildgnibe 
Streitkraefteamt 
D-5300  Bonn  2 
4000  Brandywine  Street,  NW 
Waabingtoo,  DC  20016 

Dr.  Bruce  Will  ia  me 
Department  of  Educational 
Paycholop 
Univenity  of  lllinoia 
Urban*,  0. 61801 


Dr.  Hilda  Wing 

Federal  Aviation  Adminittntion 
800  Independence  Ave,  SW 
Waabingtoo,  DC  20591 

Mr.  John  R  Wolfe 
Navy  Penennd  RAD  Center 
San  Diego,  CA  92152-6800 

Dr.  George  Wong 
Bioctatiatica  Laboratory 
Memorial  Sloan-Keatriog 
Cancer  Center 
1275  York  Avenue 
New  York.  NY  10K1 

Dr.  Wallace  Wulfeck.  HI 
Navy  Penoond  RAD  Center 
Code  51 

Sen  Diego,  CA  92152-6800 

Dr.  Kcniaro  Yamamoto 
02-T 

Educational  Tcating  Service 
Roaedale  Road 
Priocctno.  NJ  06541 


Dr.  Weody  Yen 
CTB/McGraw  HJI 
Dd  Moot*  Reaearch  Park 
Monterey,  CA  939*0 

Dr.  Jceeph  L.  Young 
Naliooal  Soence  Foundation 
Room  320 
1800  G  Street,  N.W. 
Waehingtoo,  DC  20550 

Mr.  Anthony  R.  Zara 
National  Council  of  Stale 
Boarda  of  Nuning  Inc. 
625  North  Michigan  Avenue 
Suite  1544 
Chicago,  IL  <0611 


Dr.  Ronald  A.  Warns*. 
Bax  146 

Ctrmd,  CA  93921 


